Homomorphic Evaluation Including Key Switching, Modulus Switching, And Dynamic Noise Management

ABSTRACT

A homomorphic evaluation of a function is performed on input ciphertext(s), which were encrypted using an encryption scheme that includes multiple integer moduli. Each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer. Each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p i  of the modulus q and each column j corresponding to a polynomial factor F j (X) of the m-th cyclotomic polynomial Φ m (X) modulo q. Content of the matrix in row i and column j corresponds to the element a(X) modulo p i  and F j (X). Performing the homomorphic evaluation of the function further includes performing operation(s) using one or more matrices from one or more of the ciphertexts.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a divisional of application Ser. No. 13/746,713, filed on Jan. 22, 2013, which claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 61/600,265, filed on Feb. 17, 2012, the disclosures of which are hereby incorporated by reference in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under Contract No.: FA8750-11-C-0096 (Defense Advanced Research Projects Agency (DARPA)). This invention was made with Government support under agreement FA8750-11-2-0079 from DARPA and the Air Force Research Laboratory (AFRL). The Government has certain rights in this invention.

BACKGROUND

This invention relates generally to encryption techniques and, more specifically, relates to homomorphic encryption techniques.

This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section. Acronyms that appear in the text or drawings are defined below, prior to the claims.

In his breakthrough result, Gentry demonstrated that fully-homomorphic encryption was theoretically possible, assuming the hardness of some problems in integer lattices. See [13] below, in a section entitled “References”. A reference or references is or are indicted by a number within square brackets or multiple numbers within square brackets, respectively. Since then, many different improvements have been made, for example authors have proposed new variants, improved efficiency, suggested other hardness assumptions, and the like. Some of these works were accompanied by implementation, but all the implementations so far were either “proofs of concept” that can compute only one basic operation at a time (e.g., at great cost), or special-purpose implementations limited to evaluating very simple functions. See [26, 14, 8, 27, 19, 9].

BRIEF SUMMARY

In an exemplary embodiment, a method is disclosed that includes performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys and a plurality of moduli, where the moduli are integers. Performing the homomorphic evaluation of the function comprises performing one or more operations on the input ciphertexts. Performing the one or more operations comprises: performing a key-switching transformation on selected ones of the one or more input ciphertexts, where performing a key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and a first modulus to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and a second modulus, where the second modulus is an integer factor p times the first modulus, where p>1, and where each of the key switching transformations is performed prior to or after the one or more operations are evaluated; and outputting one or more results of the one or more operations.

An apparatus includes one or more memories comprising computer-readable program code and one or more processors. The one or more processors are configured, responsive to execution of the computer-readable program code, to cause the apparatus to perform the method of the preceeding paragraph. A computer program product includes a computer readable storage medium having computer readable program code embodied therewith, the computer readable code comprising code for performing the method of the preceeding paragraph.

An apparatus comprises means for performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys and a plurality of moduli, where the moduli are integers, and where the means for performing the homomorphic evaluation of the function comprises means for performing one or more operations on the input ciphertexts, and where the means for performing the one or more operations comprises: means for performing a key-switching transformation on selected ones of the one or more input ciphertexts, where performing a key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and a first modulus to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and a second modulus, where the second modulus is an integer factor p times the first modulus, where p>1, where each of the key switching transformations is performed prior to or after the one or more operations are evaluated; and means for outputting one or more results of the one or more operations.

Another method is described that includes performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys, Each input ciphertext comprises a plurality of real numbers that are kept with finite precision. Performing the homomorphic evaluation of the function comprises performing one or more operations, and where performing each of one or more operations comprises: performing a key-switching transformation on selected ones of the one or more input ciphertexts, where performing key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and with some number r bits of precision to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and with some other number r′ bits of precision, where r′>r, where each of the key switching transformations is performed prior to or after the one or more operations are evaluated; and outputting one or more results of the one or more operations.

An apparatus includes one or more memories comprising computer-readable program code and one or more processors. The one or more processors are configured, responsive to execution of the computer-readable program code, to cause the apparatus to perform the method of the preceeding paragraph. A computer program product includes a computer readable storage medium having computer readable program code embodied therewith, the computer readable code comprising code for performing the method of the preceeding paragraph.

Another apparatus is described that includes means for performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys, Each input ciphertext comprises a plurality of real numbers that are kept with finite precision. The means for performing the homomorphic evaluation of the function comprises means for performing one or more operations, and where the means for performing each of one or more operations comprises: means for performing a key-switching transformation on selected ones of the one or more input ciphertexts, where performing key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and with some number r bits of precision to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and with some other number r′ bits of precision, where r′>r, where each of the key switching transformations is performed prior to or after the one or more operations are evaluated; and means for outputting one or more results of the one or more operations.

An additional exemplary embodiment is a method that includes performing a homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer, where each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p_(i) of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X), and where performing the homomorphic evaluation of the function further comprises performing one or more operations using one or more matrices from one or more of the ciphertexts.

An apparatus includes one or more memories comprising computer-readable program code and one or more processors. The one or more processors are configured, responsive to execution of the computer-readable program code, to cause the apparatus to perform the method of the preceeding paragraph. A computer program product includes a computer readable storage medium having computer readable program code embodied therewith, the computer readable code comprising code for performing the method of the preceeding paragraph.

An additional exemplary embodiment is an apparatus that includes means for performing a homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer, where each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p_(i) of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X), and where the means for performing the homomorphic evaluation of the function further comprises means for performing one or more operations using one or more matrices from one or more of the ciphertexts.

A further method is disclosed that includes performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys and a plurality of moduli. The moduli are integers. Performing the homomorphic evaluation comprises performing one or more operations, where performing each of one or more operations comprises: selecting one or more ciphertexts and determining an estimate of noise in the selected ciphertexts; for each one of the selected ciphertexts, in response to a determination the noise magnitude meets at least one criterion, performing a modulus switching operation on the ciphertext to convert the ciphertext from one of the plurality of secret keys and a first modulus into a second ciphertext with respect to a same secret key but a second modulus, and updating the noise estimate following the modulus switching operation; performing one additional homomorphic evaluation operations on the selected ciphertexts; computing the noise estimate for the result of the homomorphic operation from the noise estimate of the selected one or more ciphertexts; and outputting the result of the homomorphic operation together with its noise estimate.

An apparatus includes one or more memories comprising computer-readable program code and one or more processors. The one or more processors are configured, responsive to execution of the computer-readable program code, to cause the apparatus to perform the method of the preceeding paragraph. A computer program product includes a computer readable storage medium having computer readable program code embodied therewith, the computer readable code comprising code for performing the method of the preceeding paragraph.

A further apparatus is disclosed that includes means for performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys and a plurality of moduli. The moduli are integers. The means for performing the homomorphic evaluation comprises means for performing one or more operations, where the means for performing each of one or more operations comprises: means for selecting one or more ciphertexts and determining an estimate of noise in the selected ciphertexts; means, for each one of the selected ciphertexts and responsive to a determination the noise magnitude meets at least one criterion, for performing a modulus switching operation on the ciphertext to convert the ciphertext from one of the plurality of secret keys and a first modulus into a second ciphertext with respect to a same secret key but a second modulus, and means for updating the noise estimate following the modulus switching operation; means for performing one additional homomorphic evaluation operations on the selected ciphertexts; means for computing the noise estimate for the result of the homomorphic operation from the noise estimate of the selected one or more ciphertexts; and means for outputting the result of the homomorphic operation together with its noise estimate.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 illustrates a block diagram of an exemplary system in which various exemplary embodiments of the invention may be implemented;

FIG. 2 illustrates a simple block diagram of a requestor and a server, such as a search engine, that use the fully homomorphic encryption scheme in accordance with possible exemplary embodiments of this invention;

FIGS. 3A, 3B and 4 are logic flow diagrams that illustrate the operation of an exemplary method, a result of execution of computer program instructions embodied on a computer readable memory, and/or functions performed by logic implemented in hardware, in accordance with exemplary embodiments of this invention;

FIG. 5 shows pseudo-code for exemplary modulus switching;

FIG. 6 shows pseudo-code for an exemplary SwitchKey procedure;

FIG. 7 shows pseudo-code for an exemplary multiplication procedure;

FIG. 8 is a table of results for k=80-bits of security and for several different depth parameters L;

FIG. 9 is a table having concrete values for two situations for experiments, where the first situation corresponds to performing arithmetic on bytes in

₂₈ (i.e., n=8), and the second situation corresponds to arithmetic on bits in

₂ (i.e., n=1); and

FIG. 10 is a logic flow diagram that illustrates the operation of an exemplary method, a result of execution of computer program instructions embodied on a computer readable memory, and/or functions performed by logic implemented in hardware, in accordance with exemplary embodiments of this invention.

DETAILED DESCRIPTION

Before proceeding with additional description of the exemplary embodiments, it is helpful to provide an overview of a system in which the exemplary embodiments may be performed and exemplary operations performed by such a system. A system herein performs homomorphic evaluation of ciphertext in order to perform operations on the ciphertext. The homomorphic evaluation is performed without the secret key used to encrypt the ciphertext.

Turning to FIG. 1, this figure illustrates a block diagram of an exemplary system in which various exemplary embodiments of the invention may be implemented. The system 100 may include at least one circuitry 102 (such as an integrated circuit) that may in certain exemplary embodiments include one or more processors 104. The system 100 may also include one or more memories 106 (e.g., a volatile memory device, a non-volatile memory device), and may include at least one storage 108. The storage 108 may include a non-volatile memory device such as a magnetic disk drive, an optical disk drive and/or a tape drive, as non-limiting examples. The storage 108 may comprise an internal storage device, an attached storage device and/or a network accessible storage device, as non-limiting examples. The system 100 may include program logic 110 including code 112 (e.g., computer-readable program code) that may be loaded into the memory 106 and executed by the processor 104 and/or circuitry 102. In certain exemplary embodiments, the program logic 110, including code 112, may be stored in the storage 108. In certain other exemplary embodiments, the program logic 110 may be implemented in the circuitry 102. Therefore, while FIG. 1 shows the program logic 110 separately from the other elements, the program logic 110 may be implemented in the memory 106 and/or the circuitry 102, as non-limiting examples.

The system 100 may include at least one communications component 114 that enables communication with at least one other component, system, device and/or apparatus. As non-limiting examples, the communications component 114 may include a transceiver configured to send and receive information, a transmitter configured to send information and/or a receiver configured to receive information. As a non-limiting example, the communications component 114 may comprise a modem or network card. The system 100 of FIG. 1 may be embodied in a computer or computer system, such as a desktop computer, a portable computer or a server, as non-limiting examples. The components of the system 100 shown in FIG. 1 may be connected or coupled together using one or more internal buses, connections, wires and/or (printed) circuit boards, as non-limiting examples.

It should be noted that in accordance with the exemplary embodiments of the invention, one or more of the circuitry 102, processor(s) 104, memory 106, storage 108, program logic 110 and/or communications component 114 may store one or more of the various items (e.g., public/private key(s), ciphertexts, encrypted items, matrices, variables, equations, formula, operations, operational logic, logic) discussed herein. As a non-limiting example, one or more of the above-identified components may receive and/or store the plaintext (e.g., to be encrypted or resulting from decryption) and/or the ciphertext (e.g., to be decrypted, to be operated on homomorphically, or resulting from encryption). As a further non-limiting example, one or more of the above-identified components may receive and/or store the encryption function(s) and/or the decryption function(s), as described herein.

The exemplary embodiments of this invention may be carried out by computer software implemented by the processor 104 or by hardware, or by a combination of hardware and software. As a non-limiting example, the exemplary embodiments of this invention may be implemented by one or more integrated circuits. The memory 106 may be of any type appropriate to the technical environment and may be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor based memory devices, fixed memory and removable memory, as non-limiting examples. The processor(s) 104 may be of any type appropriate to the technical environment, and may encompass one or more of microprocessors, general purpose computers, special purpose computers and processors based on a multi-core architecture, as non limiting examples.

Homomorphic evaluation using a homomorphic encryption scheme has numerous applications. For example, it enables private search engine queries where the search engine responds to a query without knowledge of the query, i.e., a search engine can provide a succinct encrypted answer to an encrypted (e.g., Boolean) query without knowing what the query was. It also enables searching on encrypted data; one can store encrypted data on a remote server and later have the server retrieve only files that (when decrypted) satisfy some Boolean constraint, even though the server cannot decrypt the files on its own. More broadly, homomorphic encryption may improve the efficiency of secure multiparty computation.

One non-limiting application of homomorphic evaluation using a homomorphic encryption scheme is in a two-party setting. As previously described, a simple example is making encrypted queries to search engines. Referring to FIG. 2, to perform an encrypted search a party (requestor 1) generates a public key pk (and a plurality, N, of secret keys, s^(k)) for the homomorphic encryption scheme, and generates ciphertexts c₁, . . . , c_(i) that encrypt the query π₁, . . . , π_(t) under p^(k). For example, each π_(l) could be a single bit of the query. Now, let the circuit C express a search engine server 2 search function for data stored in storage 3. The server 2 sets c*_(i)←Evaluate (p^(k), C_(i), c₁, . . . , c_(l)), where C_(i) is the sub-circuit of C that computes the i'th bit of the output. Note that, in practice, the evaluation of c*_(i) and c*_(j) may share intermediate results, in which case it may be needlessly inefficient to run independent instances of the Evaluate algorithm. The server 2 sends these ciphertexts to the requestor 1. It is known that Decrypt(s^(k),c*_(i))=C_(i)(π₁, . . . , π_(l)). These latter values constitute precisely the answer to the query, which is recoverable through decryption.

As another non-limiting application, the exemplary embodiments of this invention enable searching over encrypted data. In this scenario, assume that the requestor 1 stores files on the server 2 (e.g., on the Internet), so that the requestor 1 can conveniently access these files without needing the requestor's computer. However, the requestor encrypts the files, otherwise the server 2 could potentially read the private data. Let bits represent the files, which are encrypted in the ciphertexts c₁, . . . , c_(l). Assume then that the requestor 1 later wants to download all encrypted files that satisfy a query, e.g., all files containing the word ‘homomorphic’ within 5 words of ‘encryption’, but not the word ‘evoting’. The requestor 1 sends the query to the server 2, which expresses it as a circuit C. The server sets c*_(i)←Evaluate (p^(k), C_(i), c₁ . . . , c_(l)) and sends these ciphertexts to the requestor 1, which decrypts the returned ciphertexts to recover C(π₁, . . . , π_(l)), the (bits of the) files that satisfy the query.

Note that in this application, as in the encrypted search application, the requestor provides the number of bits that the response should have, and the encrypted response from the server 2 is padded or truncated to meet the upper bound.

Concerning additional description of the exemplary embodiments, in this disclosure is, described the first implementation powerful enough to support an “interesting real world circuit”. In an example, a variant is implemented of the leveled FHE-without-bootstrapping scheme of [5], with support for deep enough circuits so that one can evaluate an entire AES-128 encryption operation. For this implementation both AES-specific optimizations as well as several “generic” tools for FHE evaluation are developed. These last tools include (among others) a different variant of the Brakerski-Vaikuntanathan key-switching technique that does not require reducing the norm of the ciphertext vector, and a method of implementing the Brakerski-Gentry-Vaikuntanathan (BGV) modulus-switching transformation on ciphertexts in CRT representation.

For ease of reference, the instant disclosure is separated into sections.

1 INTRODUCTION

An exemplary implementation is based on a variant of the BGV scheme [5, 7, 6] (based on ring-LWE [22]), using the techniques of Smart and Vercauteren (SV) [27] and Gentry, Halevi and Smart (GHS) [15], and many new optimizations are introduced herein. Some of our optimizations are specific to AES, and these are described in Section 4. Most of our optimization, however, are more general-purpose and can be used for homomorphic evaluation of other circuits, and these examples are described in Section 3.

Since the cryptosystem is defined over a polynomial ring, many of the operations involve various manipulation of integer polynomials, such as modular multiplications and additions and Frobenius maps. Most of these operations can be performed more efficiently in evaluation representation, when a polynomial is represented by the vector of values that it assumes in all the roots of the ring polynomial (for example polynomial multiplication is just point-wise multiplication of the evaluation values). On the other hand some operations in BGV-type cryptosystems (such as key switching and modulus switching) seem to require coefficient representation, where a polynomial is represented by listing all its coefficients. The need for coefficient representation ultimately stems from the fact that the noise in the ciphertexts is small in coefficient representation but not in evaluation representation. Hence a “naive implementation” of FHE would need to convert the polynomials back and forth between the two representations, and these conversions turn out to be the most time-consuming part of the execution. In our implementation we keep ciphertexts in evaluation representation at all times, converting to coefficient representation only when needed for some operation, and then converting back. Many of our general-purpose optimizations are aimed at reducing the number of FFTs and CRTs that we need to perform, by reducing the number of times that we need to convert polynomials between coefficient and evaluation representations.

We describe variants of key switching and modulus switching that can be implemented while keeping almost all the polynomials in evaluation representation. Our key-switching variant has another advantage, in that it significantly reduces the size of the key-switching matrices in the public key. This is particularly important since the main limiting factor for evaluating deep circuits turns out to be the ability to keep the key-switching matrices in memory. Other optimizations that we present are meant to reduce the number of modulus switching and key switching operations that we need to do. This is done by tweaking some operations (such as multiplication by constant) to get a slower noise increase, by “batching” some operations before applying key switching, and by attaching to each ciphertext an estimate of the “noisiness” of this ciphertext, in order to support better noise bookkeeping.

An exemplary implementation was based in 2011 on the NTL C++ library running over GMP, and we utilized a machine which consisted of a processing unit of Intel Xeon CPUs running at 2.0 GHz with 18 MB cache, and most importantly with 256 GB of RAM. It is expected that processing and memory requirements will be reduced over time.

Memory was our main limiting factor in the implementation. With this machine it took us just under two days to compute a single block AES encryption using an implementation choice which minimizes the amount of memory required; this is roughly two orders of magnitude faster than what could be done with the Gentry-Halevi implementation [14]. The computation was performed on ciphertexts that could hold 864 plaintext slots each; where each slot holds an element of

₂ ₈ . This means that we can compute └864/16┘=54 AES operations in parallel, which gives an amortize time per block of roughly forty minutes. A second (byte-sliced) implementation, requiring more memory, completed an AES operation in around five days; where ciphertexts could hold 720 different

₈ slots (hence we can evaluate 720 blocks in parallel). This results in an amortized time per block of roughly five minutes.

We note that there are a multitude of optimizations that one can perform on our basic implementation. Most importantly, we believe that by using the “bootstrapping as optimization” technique from BGV [5] we can speed up the AES performance by an additional order of magnitude. Also, there are great gains to be had by making better use of parallelism: Unfortunately, the NTL library (which serves as an exemplary underlying software platform) is not thread safe, which severely limits our ability to utilize the multi-core functionality of modem multi-core processors. We expect that by utilizing many threads we can speed up some of our (higher memory) AES variants by as much as a 16× factor; just by letting each thread compute a different S-box lookup.

Regarding organization of the rest of this disclosure, in Section 2 we review the main features of BGV-type cryptosystems [6, 5], and briefly survey the techniques for homomorphic computation on packed ciphertexts from SV and GHS [27, 15]. Then in Section 3 we describe our “general-purpose” optimizations on a high level, with additional details provided in Appendices 5 and 6. A brief overview of AES and a high-level description and performance numbers is provided in Section 4.

2 BACKGROUND 2.1 Notations and Mathematical Background

For an integer q we identify the ring

/q

with the interval (−q/2, q/2]∩

, and use [z]_(q) to denote the reduction of the integer z modulo q into that interval. Our implementation utilizes polynomial rings defined by cyclotomic polynomials,

=

[X]/Φ_(m)(X). The ring

is the ring of integers of the m th cyclotomic number field

(ζ_(m)). We let

$_{q}\overset{def}{=}{{\text{/}q\; } = {{{\mathbb{Z}}\lbrack X\rbrack}\text{/}\left\{ {{\Phi_{m}(X)},q} \right)}}$

for the (possibly composite) integer q, and we identify

_(q) with the set of integer polynomials of degree up to φ(m)−1 reduced modulo q.

Coefficient vs. Evaluation Representation.

Let m, q be two integers such that Z/qZ contains a primitive m-th root of unity, and denote one such primitive m-th root of unity by ζεZ/qZ. Recall that the m'th cyclotomic polynomial splits into linear terms modulo q,

Φ_(m)(X)=Π_(iε(Z/mZ))·(X−ζ ^(i))(mod q).

We consider two ways of representing an element aεA_(q). One representation is performed by viewing a as a degree-(φ(m)−1) polynomial, a(X)=Σ_(i<φ(m))α_(i)X^(i), the coefficient representation of a just lists all the coefficients in order, a=

a₀, . . . , a_(φ(m)−1)

ε(Z/qZ)^(φ(m)). For the other representation, we consider the values that the polynomial a(X) assumes on all primitive m-th roots of unity modulo q, b_(i)=a(ζ^(i)) mod q for iε(Z/mZ)*. The b_(i)'s in order also yield a vector b, which we call the evaluation representation of a. Clearly these two representations are related via b=V_(m)·a, where V_(m) is the Vandermonde matrix over the primitive m-th roots of unity modulo q. We remark that for all i we have the equality a mod (X−ζ^(i))=a(ζ^(i))=b_(i), hence the evaluation representation of a is just a polynomial Chinese-Remaindering representation.

In both representations, an element aεA_(q) is represented by a φ(m)-vector of integers in Z/qZ. If q s a composite then each of these integers can itself be represented either using the standard binary encoding of integers or using Chinese-Remaindering relative to the factors of q. We usually use the standard binary encoding for the coefficient representation and Chinese-Remaindering for the evaluation representation. (Hence the latter representation is really a double CRT representation, relative to both the polynomial factors of Φ_(m)(X) and the integer factors of q.)

2.2 BGV-type Cryptosystems

An exemplary embodiment uses a variant of the BGV cryptosystem due to Gentry, Halevi and Smart, specifically the one described in [15, Appendix˜D] (in the full version). In this cryptosystem both ciphertexts and secret keys are vectors over the polynomial ring

, and the native plaintext space is the space of binary polynomials

₂. (More generally the plaintext space could be A_(p) for some fixed p≧2, but in our case we will use A₂.)

At any point during the homomorphic evaluation there is some “current integer modulus q” and “current secret key s”, which change from time to time. A ciphertext c is decrypted using the current secret key s by taking inner product over A_(q) (with q the current modulus) and then reducing the result modulo 2 in coefficient representation. Namely, the decryption formula is

a←[[

c,s

mod Φ_(m)(X)]_(q)]₂.  (1)

The polynomial [

c,s

mod Φ_(m) (X)]_(q) is called the “noise” in the ciphertext c. Informally, c is a valid ciphertext with respect to secret keys and modulus q if this noise has “sufficiently small norm” relative to q. The meaning of “sufficiently small norm” is whatever is needed to ensure that the noise does not wrap around q when performing homomorphic operations, in our implementation we keep the norm of the noise always below some pre-set bound (which is determined in Section 7.2).

Following [22, 15], the specific norm that we use to evaluate the magnitude of the noise is the “canonical embedding norm reduced mod q”, specifically we use the conventions as described in [15, Appendix˜D] (in the full version). This is useful to get smaller parameters, but for the purpose of presentation the reader can think of the norm as the Euclidean norm of the noise in coefficient representation. More details are given in the Appendices. We refer to the norm of the noise as the noise magnitude.

The central feature of BGV-type cryptosystems is that the current secret key and modulus evolve as we apply operations to ciphertexts. We apply five different operations to ciphertexts during homomorphic evaluation. Three of them—addition, multiplication, and automorphism—are “semantic operations” that we use to evolve the plaintext data which is encrypted under those ciphertexts. The other two operations—key-switching and modulus-switching—are used for “maintenance”: These operations do not change the plaintext at all, they only change the current key or modulus (respectively), and they are mainly used to control the complexity of the evaluation. Below we briefly describe each of these five operations on a high level. For the sake of self-containment, we also describe key generation and encryption in Section 6. More detailed description can be found in [15, Appendix˜D].

Addition

Homomorphic addition of two ciphertext vectors with respect to the same secret key and modulus q is done just by adding the vectors over A_(q). If the two arguments were encrypting the plaintext polynomials a₁, a₂εA₂, then the sum will be an encryption of a₁+a₂εA₂. This operation has no effect on the current modulus or key, and the norm of the noise is at most the sum of norms from the noise in the two arguments.

Multiplication

Homomorphic multiplication is done via tensor product over A_(q). In principle, if the two arguments have dimension n over A_(q) then the product ciphertext has dimension n², each entry in the output computed as the product of one entry from the first argument and one entry from the second. It was shown in [7] that over polynomial rings this operation can be implemented while increasing the dimension only to 2n−1 rather than the expected n².

This operation does not change the current modulus, but it changes the current key: If the two input ciphertexts are valid with respect to the dimension-^(n) secret key vector^(s), encrypting the plaintext polynomials a₁, a₂ε

₂, then the output is valid with respect to the dimension-^(n) ² secret key s′ which is the tensor product of S with itself, and it encrypts the polynomial a₁·a₂ε

₂. The norm of the noise in the product ciphertext can be bounded in terms of the product of norms of the noise in the two arguments. For our choice of norm function, the norm of the product is no larger than the product of the norms of the two arguments.

Key Switching

The public key of BGV-type cryptosystems includes additional components to enable converting a valid ciphertext with respect to one key into a valid ciphertext encrypting the same plaintext with respect to another key. For example, this is used to convert the product ciphertext which is valid with respect to a high-dimension key back to a ciphertext with respect to the original low-dimension key.

To allow conversion from dimension-n′ key s′ to dimension-n key s (both with respect to the same modulus q), we include in the public key a matrix W=W[s′→s] over A_(q), where the i'th column of W is roughly an encryption of the i'th entry of s′ with respect to s (and the current modulus). Then given a valid ciphertext c′ with respect to s′, we roughly compute c=W·c′ to get a valid ciphertext with respect to s.

In some more detail, the BGV key switching transformation first ensures that the norm of the ciphertext c′ itself is sufficiently low with respect to q. In [5] this was done by working with the binary encoding of c′, and one of our main optimization in this work is a different method for achieving the same goal (cf. Section 3.1). Then, if the i'th entry in s′ is s′_(i)εA (with norm smaller than q), then the i'th column of W[s′→s] is an n-vector w_(i) such that [

w_(i), s

mod Φ_(m)]_(q)=2e+s′_(i) for a low-norm polynomial e_(i)εA. Denoting e=(e₁, . . . , e_(n′)), this means that we have sW=s′+2e over A_(q). For any ciphertext vector c′, setting c=W·c′εA_(q) we get the equation:

[

c,s

mod Φ_(m)(X)]_(q) =[sWc′ mod Φ_(m)(X)]_(q) =[

c′,s′

+2

c′,e

mod Φ_(m)(X)]_(q),

Since c′, e, and [

c′, s′

mod Φ_(m)]_(q) all have low norm relative to q, then the addition on the right-hand side does not cause a wrap around q, hence we get [[

c, s

mod Φ_(m)]_(q)]₂=[[

c′, s′

mod Φ_(m)]_(q)]₂, as needed. The key-switching operation changes the current secret key from s′ to s, and does not change the current modulus. The norm of the noise is increased by at most an additive factor of 2∥

c′, e

∥.

Modulus Switching

The modulus switching operation is intended to reduce the norm of the noise, to compensate for the noise increase that results from all the other operations. To convert a ciphertext c with respect to secret key s and modulus q into a ciphertext c′ encrypting the same thing with respect to the same secret key but modulus q′, we roughly just scale c by a factor q′/q (thus getting a fractional ciphertext), then round appropriately to get back an integer ciphertext. Specifically c′ is a ciphertext vector satisfying (a) c′≡c (mod 2), and (b) the “rounding error term” τ=c′−(q′/q)c has low norm. Converting c to c′ is easy in coefficient representation, and one of our exemplary optimizations is a method for doing the same in evaluation representation (cf. Section 3.2) This operation leaves the current key s unchanged, changes the current modulus from q to q′, and the norm of the noise is changed as |v′|≦(q′/q)|v|+τ·∥s∥. Note that if the key s has low norm and q′ is sufficiently smaller than q, then the noise magnitude decreases by this operation.

A BGV-type cryptosystem has a chain of moduli, q₀<q₁ . . . <q_(L-1), where fresh ciphertexts are with respect to the largest modulus q_(L-1). During homomorphic evaluation every time the (estimated) noise grows too large we apply modulus switching from q_(i) to g_(i−1) in order to decrease it back. Eventually we get ciphertexts with respect to the smallest modulus q₀, and we cannot compute on them anymore (except by using bootstrapping).

Automorphisms

In addition to adding and multiplying polynomials, another useful operation is converting the polynomial a(X)ε

to

${a^{(i)}(X)}\overset{def}{=}{{a\left( X^{i} \right)}{mod}\; {{\Phi_{m}(X)}.}}$

Denoting by κ_(i) the transformation κ_(i):a

a^((i)), it is a standard fact that the set of transformations {κ_(i):iε(

/m

)*} forms a group under composition (which is the Galois group

al(

(ζ_(m))/

)), and this group is isomorphic to (

/m

)*. In [5, 15] it was shown that applying the transformations κ_(i) to the plaintext polynomials is very useful, some more examples of its use can be found in Section 4.

Denoting by c^((i)), s^((i)) the vector obtained by applying κ_(i) to each entry in c,s, respectively, it was shown in [5, 15] that if s is a valid ciphertext encrypting a with respect to key s and modulus q, then c^((i)) is a valid ciphertext encrypting a^((i)) with respect to key s^((i)) and the same modulus q. Moreover the norm of noise remains the same under this operation. We remark that we can apply key-switching to c^((i)) in order to get an encryption of a^((i)) with respect to the original key s.

2.3 Computing on Packed Ciphertexts

Smart and Vercauteren observed [26, 27] that the plaintext space

₂ can be viewed as a vector of “plaintext slots”, by an application the polynomial Chinese Remainder Theorem. Specifically, if the ring polynomial Φ_(m)(X) factors modulo 2 into a product of irreducible factors Φ_(m)(X)=Π_(j=0) ^(l-1)F_(j)(X)(mod 2), then a plaintext polynomial a(X)ε

₂ can be viewed as encoding l different small polynomials, a_(j)=a mod F_(j). Just like for integer Chinese Remaindering, addition and multiplication in

₂ correspond to element-wise addition and multiplication of the vectors of slots.

The effect of the automorphisms is a little more involved. When i is a power of two, then the transformations κ_(i):a

a^((i)) applied to each slot separately. When i is not a power of two, the transformation κ_(i) has the effect of roughly shifting the values between the different slots. For example, for some parameters we could get a cyclic shift of the vector of slots: If a encodes the vector (a₀, a₁, . . . , a_(l-1)), then κ_(i)(a) (for some i) could encode the vector (a_(t-1), a₀, . . . , a_(t-2)). This was used in [15] to devise efficient procedures for applying arbitrary permutations to the plaintext slots.

We note that the values in the plaintext slots are not just bits, rather they are polynomials modulo the irreducible F_(j)'s, so they can be used to represents elements in extension fields GF (2^(d)). In particular, in some of our AES implementations we used the plaintext slots to hold elements of GF(2⁸), and encrypt one byte of the AES state in each slot. Then we can use an adaption of the techniques from [15] to permute the slots when performing the AES row-shift and column-mix.

3 GENERAL-PURPOSE OPTIMIZATIONS

Below we summarize our optimizations that are not tied directly to the AES circuit and can be used also in homomorphic evaluation of other circuits. Underlying many of these optimizations is our choice of keeping ciphertext and key-switching matrices in evaluation (double-CRT) representation. Our chain of moduli is defined via a set of primes of roughly the same size, p₀, . . . , P_(L-1), all chosen such that

/p_(i)

has a m'th roots of unity. (In other words, m|p_(i)−1 for all i.) For i=0, . . . , L−1 we then define our a i'th modulus as q_(i)=Π_(j=0) ^(i)p_(i). The primes p₀ and p_(L-1) are special (p₀ is chosen to ensure decryption works, and p_(L-1) is chosen to control noise immediately after encryption), however all other primes p_(i) are of size 2¹⁷≦p_(i)≦2²⁰ if L<100, see Section 7 below.

In the t-th level of the scheme we have ciphertexts consisting of elements in

_(q) _(l) (i.e., polynomials modulo (Φ_(m)(X), q_(i))). We represent an element cε

_(q) _(l) by a φ(m)×(t+1) “matrix” of its evaluations at the primitive m-th roots of unity modulo the primes p₀, . . . , p_(t). Computing this representation from the coefficient representation of c involves reducing c modulo the p_(i)'s and then t+1 invocations of the FFT algorithm, modulo each of the p_(i) (picking only the FFT coefficients corresponding to (

/m

)*). To convert back to coefficient representation we invoke the inverse FFT algorithm t+1 times, each time padding the φ(m)-vector of evaluation point with m−φ(m) zeros (for the evaluations at the non-primitive roots of unity). This yields the coefficients of t+1 polynomials modulo (X^(m)−1,p_(i)) for i=0, . . . , t, we then reduce each of these polynomials modulo (Φ_(m)(X),p_(i)) and apply Chinese Remainder interpolation. We stress that we try to perform these transformations as rarely as we can.

3.1 A New Variant of Key Switching

As described in Section 2, the key-switching transformation introduces an additive factor of 2

c′,e

in the noise, where x′ is the input ciphertext and e is the noise component in the key-switching matrix. To keep the noise magnitude below the modulus q, it seems that we need to ensure that the ciphertext c′ itself has low norm. In BGV [5] this was done by representing c′ as a fixed linear combination of small vectors, i.e. c′=Σ_(i)2^(i)c′_(i) with c′_(i) if the vector of i'th bits in c′. Considering the high-dimension ciphertext c*=(c′₀|c′₁|c′₂| . . . ) and secret key s*=(s′|2s′|4s′| . . . ), we note that we have

c*, s*

=

c′,s′

, and c has low norm (since it consists of 0-1 polynomials). BGV therefore included in the public key the matrix W=W[s*→s] (rather than W[s′→s]), and had the key-switching transformation computes c from c′ and sets c=W·c*.

When implementing key-switching, there are two drawbacks to the above approach. First, this increases the dimension (and hence the size) of the key switching matrix. This drawback is fatal when evaluating deep circuits, since having enough memory to keep the key-switching matrices turns out to be the limiting factor in our ability to evaluate these deep circuits. In addition, for this key-switching we must first convert c′ to coefficient representation (in order to compute the c′_(i)'s), then convert each of the c′_(i)'s back to evaluation representation before multiplying by the key-switching matrix. In level t of the circuit, this seems to require Ω(t log q_(i)) FFTs.

In this work we propose a different variant: Rather than manipulating c′ to decrease its norm, we instead temporarily increase the modulus q. We recall that for a valid ciphertext C′, encrypting plaintext a with respect to s′ and q, we have the equality

c′, s′

=2e+a over A_(q), for a low-norm polynomial e′. This equality, we note, implies that for every odd integer p we have the equality

c′,ps′

=2e″+a, holding over A_(pq), for the “low-norm” polynomial e″

$\left( {{{namely}\mspace{14mu} e^{''}} = {{p \cdot e^{\prime}} + {\frac{p - 1}{2}a}}} \right).$

Clearly, when considered relative to secret key ps and modulus pq, the noise in c′ is p times larger than it was relative to s and q. However, since the modulus is also p times larger, we maintain that the noise has norm sufficiently smaller than the modulus. In other words, c′ is still a valid ciphertext that encrypts the same plaintext a with respect to secret key ps and modulus pq. By taking p large enough, we can ensure that the norm of c′ (which is independent of p) is sufficiently small relative to the modulus pq.

We therefore include in the public key a matrix W=W[ps′→s] modulo pq for a large enough odd integer p. (Specifically we need p≈q√{square root over (m)}) Given a ciphertext c′, valid with respect to s and q, we apply the key-switching transformation simply by setting c=W′·c′ over

_(pq). The additive noise term

c′,e

that we get is now small enough relative to our large modulus pq, thus the resulting ciphertext c is valid with respect to s and pq. We can now switch the modulus back to q (e.g., using our modulus switching routine described below), hence getting a valid ciphertext with respect to s and q.

We note that even though we no longer break c′ into its binary encoding, it seems that we still need to recover it in coefficient representation in order to compute the evaluations of c′ mod p. However, since we do not increase the dimension of the ciphertext vector, this procedure requires only O(t) FFTs in level t (vs. O(t log q_(i))=O(t²) for the original BGV variant). Also, the size of the key-switching matrix is reduced by roughly the same factor of log q_(t).

Our new variant comes with a price tag, however: We use key-switching matrices relative to a larger modulus, but still need the noise term in this matrix to be small. This means that the LWE problem underlying this key-switching matrix has larger ratio of modulus/noise, implying that we need a larger dimension to get the same level of security than with the original BGV variant. In fact, since our modulus is more than squared (from q to pq with p>q), the dimension is increased by more than a factor of two. This translates to more than doubling of the key-switching matrix, partly negating the size and running time advantage that we get from this variant.

We comment that a hybrid of the two approaches could also be used: we can decrease the norm of c′ only somewhat by breaking it into digits (as opposed to binary bits as in [5]), and then increase the modulus somewhat until it is large enough relative to the smaller norm of c′. Roughly, when we break the ciphertext into some number d of digits, we need the extra factor p to be p≈q^(1/d) or larger. We speculate that the optimal setting in terms of runtime is found around p≈√{square root over (q)}, but so far did not try to explore this tradeoff.

FIG. 3A is a flow diagram illustrating homomorphic evaluation with an example of the new variant of key switching described in this section. FIG. 3A is a logic flow diagram that illustrates the operation of an exemplary method, a result of execution of computer program instructions embodied on a computer readable memory, and/or functions performed by logic implemented in hardware, in accordance with exemplary embodiments of this invention.

Note that the flow in FIG. 3A may be performed by the system 100 (see FIG. 1), e.g., by the one or more processors 104 and/or circuitry 102, e.g., in response to execution of the code 112 in program logic 110. The system 100 may be the search engine server 2, in an exemplary embodiment. In block 300, the system 100 performs the operation of performing homomorphic evaluation of a function on one or more input ciphertexts. The one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys and a plurality of moduli, where the moduli are integers. The performing the homomorphic evaluation of the function comprises performing one or more operations on the input ciphertexts. In an example, a function is to be evaluated, where the function comprises one or multiple operations such as the semantic operations addition, multiplication, and automorphism, described above in Section 2.2. The function can be any arbitrary function, such as (x₁ ³+1)+(x₁x₂)+x₂ ⁷ (as an example of an arbitrary function, where x₁ and x₂ are ciphertexts). As these functions are applied to ciphertext(s), the “maintenance” operations of key switching (see block 310) and modulus switching (described below) are applied to control the complexity of the homomorphic evaluation.

Blocks 310, 320, and 330 illustrate examples of performing one or more operations on the input ciphertexts. In block 310, the system 100 performs the operation of performing a key-switching transformation on selected ones of the one or more input ciphertexts. Performing a key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and a first modulus to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and a second modulus. The second modulus is an integer factor p times the first modulus, where p>1. In block 320, each of the key switching transformations is performed prior to or after the one or more operations are evaluated. That is, a key switching transformation may be performed, e.g., after a multiplication operation, after an automorphism, or before other operations (such as modulus switching). In block 330, the system 100 performs the operation of outputting one or more results of the one or more operations. The one or more results may be output to, e.g., the storage 108, the memories 106, or the communications component 114. In block 340, the system 100 performs the operation of outputting one or more results of the evaluation of the function.

Note that there could be multiple operations performed and multiple key-switching transformations performed for a single function. Thus, blocks 310-330 may be performed multiple times prior to block 340 being performed. Furthermore, as illustrated by FIG. 2, there may be a circuit, C, with a number of levels. For instance, there is a description below of an application to AES and its circuits. The functions may be performed in order to evaluate the circuit.

The same key-switching optimization can also be applied to the variant of the cryptosystem proposed by Zvika Brakersky, “Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP”, in Advances in Cryptology, 32nd Annual Cryptology Conference, Santa Barbara, Calif., USA, Aug. 19-23, 2012, and Lecture Notes in Computer Science 7417 Springer 2012 CRYPTO 2012, 868-886. In that variant, the different moduli are replaced by representing real numbers with different precision: instead of working modulo an m-bit modulus, we use real numbers with in bits of precision. In this other version, the role of a larger modulus is played by using more bits of precision, and switching to a smaller modulus is performed just by ignoring the least significant bits of the real number (hence using fewer bits of precision). Just as in the procedure above, a side-effect of the key-switching transformation is to increase the modulus from q to pq, using the same optimization for the Brakersky variant will increase the precision from log(q) bits to log(pq) bits. Just as above, if we break the ciphertext into d digits (each with log(q)/d bits of precision) then we need p˜q^(1/d).

Commensurate with this, FIG. 3B is a flow diagram illustrating homomorphic evaluation with an example of a new variant of key switching described in herein. FIG. 3B is a logic flow diagram that illustrates the operation of an exemplary method, a result of execution of computer program instructions embodied on a computer readable memory, and/or functions performed by logic implemented in hardware, in accordance with exemplary embodiments of this invention.

The flow in FIG. 3B may be performed by the system 100 (see FIG. 1), e.g., by the one or more processors 104 and/or circuitry 102, e.g., in response to execution of the code 112 in program logic 110. The system 100 may be the search engine server 2, in an exemplary embodiment. In block 350, the system 100 performs the operation of performing homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using a public key of an encryption scheme that also comprises a plurality of secret keys. Each input ciphertext comprises a plurality of real numbers that are kept with finite precision. Performing the homomorphic evaluation of the function comprises performing one or more operations. The function comprises one or multiple operations such as the semantic operations addition, multiplication, and automorphism, described above in Section 2.2, and the function can be any arbitrary function.

Blocks 360, 370, and 380 illustrate examples of performing one or more operations on the input ciphertexts. In block 360, the system 100 performs the operation of performing a key-switching transformation on selected ones of the one or more input ciphertexts. Performing the key-switching transformation on a selected ciphertext comprises converting a first version of the selected ciphertext with respect to a first of the plurality of secret keys and with some number r bits of precision to a second version of the selected ciphertext with respect to a second of the plurality of secret keys and with some other number r′ bits of precision, where r′>r. In block 370, each of the key switching transformations is performed prior to or after the one or more operations are evaluated. That is, a key switching transformation may be performed, e.g., after a multiplication operation, after an automorphism, or before other operations (such as modulus switching). In block 380, the system 100 performs the operation of outputting one or more results of the one or more operations. The one or more results may be output to, e.g., the storage 108, the memories 106, or the communications component 114. In block 390, the system 100 performs the operation of outputting one or more results of the evaluation of the function.

Note that there could be multiple operations performed and multiple key-switching transformations performed for a single function. Thus, blocks 360-380 may be performed multiple times prior to block 390 being performed. Furthermore, as illustrated by FIG. 2, there may be a circuit, C, with a number of levels. For instance, there is a description below of an application to AES and its circuits. The functions may be performed in order to evaluate the circuit.

In an example, r′>2r in the method shown in FIG. 3B. In another example, performing the homomorphic evaluation in block 350 further comprises, prior to performing the key switching transformation, decreasing a norm of the first version of the selected ciphertext, by representing every number in the selected ciphertext via a sum number d>1 of smaller digits, and where r′>r/d.

An additional example of a key switching transformation is described in reference to FIG. 6.

3.2 Modulus Switching in Evaluation Representation

Given an element cεA_(q) _(t) in evaluation (double-CRT) representation relative to modulus q_(t)=Π_(j=0) ^(t)p_(j), we want to modulus-switch to q_(t-1)—i.e., scale down by a factor of p_(t); we call this operation Scale(c,q_(t),q_(t-1)). It is noted that an exemplary double CRT representation is described in section 5.3 below. The output should be c′εA, represented via the same double-CRT format (with respect to p₁, . . . , p_(t-1)), such that (a) c′≡c (mod 2), and (b) the “rounding error term” τ=c′−(c/p_(t)) has a very low norm. As p_(t) is odd, we can equivalently require that the element {tilde over (c)}=p_(t)·c′ satisfy the following:

-   -   {tilde over (c)} is divisible by p_(t),     -   {tilde over (c)}=c′ (mod 2), and     -   {tilde over (c)}−c (which is equal to p_(t)·τ) has low norm.

Rather than computing c′ directly, we will first compute {tilde over (c)} and then set c′←{tilde over (c)}/p_(t). Observe that once we compute {tilde over (c)} in double-CRT format, it is easy to output also c′ in double-CRT format: given the evaluations for {tilde over (c)} modulo p_(j) (j<t), simply multiply them by p_(t) ⁻¹ mod p_(j). The algorithm to output {tilde over (c)} in double-CRT format is as follows:

1. Set c to be the coefficient representation of c mod p_(t). Computing this requires a single “small FFT” modulo the prime p_(t). Recall that the polynomial (c mod p_(t)) is stored (in evaluation representation) in one row of our double-CRT representation of c, so we need to apply inverse-FFT to that row only, to get the same polynomial in coefficient representation.

2. Add or subtract p_(t) from every odd coefficient of c, so as to obtain a polynomial δ with coefficients in (−p_(t), p_(t)] such that δ≡c≡c(mod p_(t)) and δ≡0 (mod 2). (That is, all the coefficients of δ are even.) In other words, the end result should be as small as it can be in absolute value, so p_(t) is subtracted from odd coefficients that are greater than zero, and added to odd coefficients that are less than zero.

3. Set {tilde over (c)}=c−δ, and output it in double-CRT representation. Since we already have c in double-CRT representation, the computation of {tilde over (c)} only involved converting the coefficient representation of d to double CRT representation of d, followed by subtraction. Hence it requires just t more “small FFTs” modulo the p_(j)'s.

As all the coefficients of {tilde over (c)} are within p_(t) of those of c, the “rounding error term” τ=({tilde over (c)}−c)/p_(t) has coefficients of magnitude at most one, hence it has low norm.

The procedure above uses t+1 small FFTs in total. This should be compared to the naive method of just converting everything to coefficient representation modulo the primes (t+1 FFTs), CRT-interpolating the coefficients, dividing and rounding appropriately the large integers (of size ≈q_(t)), CRT-decomposing the coefficients, and then converting back to evaluation representation (t+1 more FFTs). The above approach makes explicit use of the fact that we are working in a plaintext space modulo 2; in Section 8 we present a technique which works when the plaintext space is defined modulo a larger modulus.

3.3 Dynamic Noise Management

As described in the literature, BGV-type cryptosystems tacitly assume that each homomorphic operation is followed a modulus switch to reduce the noise magnitude. In an exemplary implementation, however, we attach to each ciphertext an estimate of the noise magnitude in that ciphertext, and use these estimates to decide dynamically when a modulus switch must be performed.

Each modulus switch consumes a level, and hence a goal is to reduce, over a computation, the number of levels consumed. By paying particular attention to the parameters of the scheme, and by carefully analyzing how various operations affect the noise, we are able to control the noise much more carefully than in prior work. In particular, we note that modulus-switching is really only necessary just prior to multiplication (when the noise magnitude is about to get squared), in other times it is acceptable to keep the ciphertexts at a higher level (with higher noise).

FIG. 4 is a flow diagram illustrating an example of operations that could be performed in block 300 of FIG. 3A or block 350 of FIG. 3B for dynamic noise management. The flow in FIG. 4 may be performed by the system 100 (see FIG. 1), e.g., by the one or more processors 104 and/or circuitry 102, e.g., in response to execution of the code 112 in program logic 110. The system 100 may be the search engine server 2, in an exemplary embodiment. In block 412, the system 100 associates with each ciphertext an estimate 410 of the noise magnitude in that ciphertext. Exemplary formulas for the noise evolution may include but are not limited to the following. In all cases the noise estimate before the operation is v and the noise after the operation is v′.

1) Modulus-switching: v′=v(q_(t)/q_(t-1))+B_(scale) where B_(scale)≈√{square root over (φ(m)·h)} (e.g., see also Equation (3) below), and h is the number of nonzero coefficients in the secret key.

2) Key-switching: v′=p·v+B_(ks) where B_(sk)≈9φ(m)·q_(t) (e.g., see also Equation (5) below), where σ² is the variance that is used when generating error polynomials.

3) Multiply-by-constant: v′=|k|·v, where |k|≈φ(m)/2 is the magnitude of the constant.

4) Multiply: v′=v₁·v₂·√{square root over (φ(m))}.

5) Add: v′=v₁+v₂.

6) Automorphism: v′=v.

In block 415, the system 100 determines whether a modulus switching operation should be performed. For instance, a magnitude of estimate 410 meets a criterion (e.g., is greater than a threshold). In response to a determination a modulus switching operation is to be performed (block 415=Yes), then a modulus switching operation is performed in block 417, e.g., via the techniques presented in one of Sections 2.2 or 3.2. In block 418, the system 100 resets the estimate 410, e.g., to some default “base estimate” B_(scale). The flow proceeds to block 412. In response to a determination a modulus switch is not to be performed (block 415=No), additional homomorphic evaluation processing is performed in block 419. Flow proceeds to block 412 so that the associated estimate 410 can be modified (if necessary) for other homomorphic evaluation operations. In these examples, the current estimate includes estimates of a number of previous homomorphic evaluation operations, including the current operation.

3.4 Randomized Multiplication by Constants

An exemplary implementation of the AES round function uses just a few multiplication operations (only seven per byte), but it requires a relatively large number of multiplications of encrypted bytes by constants. Hence it becomes important to try and squeeze down the increase in noise when multiplying by a constant. To that end, we encode a constant polynomial in A₂ as a polynomial with coefficients in {−1,0,1}, rather than in {0,1}. Namely, we have a procedure Randomize (α) that takes a polynomial αε

₂ and replaces each non-zero coefficients with a coefficient chosen uniformly from {−1,1}. By Chernoff bound, we expect that for α with h nonzero coefficients, the canonical embedding norm of Randomize (α) to be bounded by O(√{square root over (h)}) with high probability (assuming that h is large enough for the bound to kick in). This yields a better bound on the noise increase than the trivial bound of h that we would get if we just multiply by a itself. (In Section 5.5, we present a heuristic argument that we use to bound the noise, which yields the same asymptotic bounds but slightly better constants.)

4 HOMOMORPHIC EVALUATION OF AES

Next we describe our homomorphic implementation of AES-128. We implemented three distinct implementation possibilities; we first describe the “packed implementation”, in which the entire AES state is packed in just one ciphertext. Two other implementations (of byte-slice and bit-slice AES) are described later in Section 4.2. The “packed” implementation uses the least amount of memory (which turns out to be the main constraint in our implementation), and also the fastest running time for a single evaluation. The other implementation choices allow more SIMD parallelism, on the other hand, so they can give better amortized running time when evaluating AES on many blocks in parallel.

A Brief Overview of AES

The AES-128 cipher consists of ten applications of the same keyed round function (with different round keys). The round function operates on a 4×4 matrix of bytes, which are sometimes considered as element of

₂ ₈ . The basic operations that are performed during the round function are AddKey, SubBytes, ShiftRows, MixColumns. The AddKey is simply an XOR operation of the current state with 16 bytes of key; the SubBytes operation consists of an inversion in the field

₂ ₈ followed by a fixed

₂-linear map on the bits of the element (relative to a fixed polynomial representation of

₂ ₈ ); the ShiftRows rotates the entries in the row i of the 4×4 matrix by i−1 places to the left; finally the MixColumns operations pre-multiplies the state matrix by a fixed 4×4 matrix.

An Exemplary Packed Representation of the AES State

For our implementation we chose the native plaintext space of our homomorphic encryption so as to support operations on the finite field

₂ ₈ . To this end we choose our ring polynomial as Φ_(m)(X) that factors modulo 2 into degree-d irreducible polynomials such that 8|d. (In other words, the smallest integer d such that m|(2^(d)−1) is divisible by 8.) This means that our plaintext slots can hold elements of

₂ _(d) , and in particular we can use them to hold elements of

₂ ₈ which is a sub-field of

₂ _(d) . Since we have l=φ(m)/d plaintext slots in each ciphertext, we can represent up to └l/16┘ complete AES state matrices per ciphertext.

Moreover, we choose our parameter m so that there exists an element gε

*_(m) that has order 16 in both

*_(m) and the quotient group

*_(m)/

2

. This condition means that if we put 16 plaintext bytes in slots t, tg, tg², tg³, . . . (for some tε

*_(m)), then the conjugation operation X

X^(g) implements a cyclic right shift over these sixteen plaintext bytes.

In the computation of the AES round function we use several constants. Some constants are used in the S-box lookup phase to implement the AES bit-affine transformation, these are denoted γ and γ₂ _(j) for j=0, . . . , 7. In the row-shift/col-mix part we use a constant C_(slot) that has 1 in slots corresponding to t·g^(i) for i=0, 4, 8, 12, and 0 in all the other slots of the form t·g^(i). (Here slot t is where we put the first AES byte.) We also use ‘X’ to denote the constant that has the element X in all the slots.

4.1 Homomorphic Evaluation of the Basic Operations

We now examine each AES operation in turn, and describe how it is implemented homomorphically. For each operation we denote the plaintext polynomial underlying a given input ciphertext c by a, and the corresponding content of the l plaintext slots are denoted as an l-vector (α_(i))_(i=1) ^(l), with each α_(i)ε

₂ ₈ .

4.1.1 AddKey and SubBytes

The AddKey is just a simple addition of ciphertexts, which yields a 4×4 matrix of bytes in the input to the SubBytes operation. We place these 16 bytes in plaintext slots tg^(i) for i=0, 1, . . . , 15, using column-ordering to decide which byte goes in what slot, namely we have

a≈[α ₀₀α₁₀α₂₀α₃₀α₀₁α₁₁α₂₁α₃₁α₀₂α₁₂α₂₂α₃₂α₀₃α₁₃α₂₃α₃₃],

encrypting the input plaintext matrix

$A = {\left( \alpha_{ij} \right)_{i,j} = {\begin{pmatrix} \alpha_{00} & \alpha_{01} & \alpha_{02} & \alpha_{03} \\ \alpha_{10} & \alpha_{11} & \alpha_{12} & \alpha_{13} \\ \alpha_{20} & \alpha_{21} & \alpha_{22} & \alpha_{23} \\ \alpha_{30} & \alpha_{31} & \alpha_{32} & \alpha_{33} \end{pmatrix}.}}$

During S-box lookup, each plaintext byte α_(ij) should be replaced by β_(ij)=S(α_(ij)), where S(•) is a fixed permutation on the bytes. Specifically, S(x) is obtained by first computing y=x⁻¹ in

₂ ₈ (with 0 mapped to 0), then applying a bitwise affine transformation z=T(y) where elements in

₂ ₈ are treated as bit strings with representation polynomial G(X)=x⁸+x⁴+x³+x+1.

We implement

₂ ₈ inversion followed by the

₂ aft-me transformation using the Frobenius automorphisms, X→X² ^(j) . Recall that for a power of two k=2^(j), the transformation κ_(k)(a(X))=(a(X^(k)) mod Φ_(m)(X)) is applied separately to each slot, hence we can use it to transform the vector (α_(i))_(i=1) ^(l) into (α_(i) ^(k))_(i=1) ^(l). We note that applying the Frobenius automorphisms to ciphertexts has almost no influence on the noise magnitude, and hence it does not consume any levels. It does increase the noise magnitude somewhat, because we need to do key switching after these automorphisms. But this is only a small influence, and we will ignore it here.

Inversion over

₂ ₈ is done using essentially the same procedure as Algorithm 2 from [25] for computing β=α⁻¹=α²⁵⁴. This procedure takes only three Frobenius automorphisms and four multiplications, arranged in a depth-3 circuit (see details below.) To apply the AES F₂ affine transformation, we use the fact that any

₂ affine transformation can be computed as a

₂ ₈ affine transformation over the conjugates. Thus there are constants γ₀, γ₁, . . . , γ₇, δεF₂ ₈ such that the AES affine transformation T_(AES)(•) can be expressed as T_(AES)(β)=δ+Σ_(j=0) ⁷γ_(j)·β² ^(j) over

₂ ₈ . We therefore again apply the Frobenius automorphisms to compute eight ciphertexts encrypting the polynomials κ_(k)(b) for k=1, 2, 4, . . . , 128, and take the appropriate linear combination (with coefficients the γ_(j)'s) to get an encryption of the vector (T_(AES)(α_(i) ⁻¹))_(i=1) ^(l). For our parameters, a multiplication-by-constant operation consumes roughly half a level in terms of added noise.

One subtle implementation detail to note here, is that although our plaintext slots all hold elements of the same field

₂ ₈ , they hold these elements with respect to different polynomial encodings. The AES affine transformation, on the other hand, is defined with respect to one particular fixed polynomial encoding. This means that we must implement in the i'th slot not the affine transformation T_(AES)(•) itself but rather the projection of this transformation onto the appropriate polynomial encoding: When we take the affine transformation of the eight ciphertexts encrypting b_(j)=κ₂ _(j) (b), we therefore multiply the encryption of b_(j) not by a constant that has γ_(j) in all the slots, but rather by a constant that has in slot i the projection of γ_(j) to the polynomial encoding of slot i.

The table below illustrates a pseudo-code description of an exemplary S-box lookup implementation, together with an approximation of the levels that are consumed by these operations. (These approximations are somewhat under-estimates, however.)

Level Input: ciphertext c t // Compute c₂₅₄ = c⁻¹ 1. c₂ ← c >> 2 t // Frobenius X

 X² 2. c₃ ← c × c₂ t + 1 // Multiplication 3. c₁₂ ← c₃ >> 4 t + 1 // Frobenius X

 X⁴ 4. c₁₄ ← c₁₂ × c₂ t + 2 // Multiplication 5. c₁₅ ← c₁₂ × c₃ t + 2 // Multiplication 6. c₂₄₀ ← c₁₅ >> 16 t + 2 // Frobenius X

 X¹⁶ 7. c₂₅₄ ← c₂₄₀ × c₁₄ t + 3 // Multiplication // Affine transformation over

₂ 8. c_(2j)′ ← c₂₅₄ >> t + 3 // Frobenius X

 X^(2i) 2^(j) for j = 0, 2, 1, . . ., 7 9. c″ ← γ + Σ_(j=0) ⁷ γj × c_(2j)′ t + 3.5 // Linear combination over

₂ ₃

4.1.2 ShiftRows and MixColumns

As commonly done, we interleave the ShiftRowsMixColumnE operations, viewing both as a single linear transformation over vectors from (

₂ ₈ )¹⁶. As mentioned above, by a careful choice of the parameter m and the placement of the AES state bytes in our plaintext slots, we can implement a rotation-by-i of the rows of the AES matrix as a single automorphism operations X

X^(g) ^(i) (for some element gε(

/m

)*). Given the ciphertext c″ after the SubBytes step, we use these operations (in conjunction with l-SELECT operations, as described in [15]) to compute four ciphertexts corresponding to the appropriate permutations of the 16 bytes (in each of the l/16 different input blocks). These four ciphertexts are combined via a linear operation (with coefficients 1, X, and (1+X)) to obtain the final result of this round function. The table below shows a pseudo-code of this implementation and an approximation for the levels that it consumes (starting from t−3.5). We note that the permutations are implemented using automorphisms and multiplication by constant, thus we expect them to consume roughly ½ level.

Level Input: ciphertext c″ t + 3.5 10.  c_(j)* ← π_(j)(c″) for j = 1, 2, 3, 4 t + 4.0 // Permutations 11.  Output X · c₁* + (X + 1) · t + 4.5 // Linear combination c₂* + c₃* + c₄*

4.1.3 The Cost of One Round Function

The above description yields an estimate of 5 levels for implementing one round function. This is however, an underestimate. The actual number of levels depends on details such as how sparse the scalars are with respect to the embedding via Φ_(m) in a given parameter set, as well as the accumulation of noise with respect to additions, Frobenius operations etc Running over many different parameter sets we find the average number of levels per round for this method varies between 5.0 and 6.0.

We mention that the byte-slice and bit-slice implementations, given in Section 4.2 below, can consume fewer levels per round function, since these implementations do not need to permute slots inside a single ciphertext. Specifically, for our byte-sliced implementation, we only need 4.5-5.0 levels per round on average. However, since we need to manipulate many more ciphertexts, the implementation takes much more time per evaluation and requires much more memory. On the other hand it offers wider parallelism, so yields better amortized time per block. Our bit-sliced implementation should theoretical consume the least number of levels (by purely counting multiplication gates), but the noise introduced by additions means the average number of levels consumed per round varies from 5.0 up to 10.0.

4.2 Byte- and Bit-Slice Implementations

In the byte sliced implementation we use sixteen distinct ciphertexts to represent a single state matrix. (But since each ciphertext can hold l plaintext slots, then these 16 ciphertexts can hold the state of l different AES blocks). In this representation there is no interaction between the slots, thus we operate with pure l-fold SIMD operations. The AddKey and SubBytes steps are exactly as above (except applied to 16 ciphertexts rather than a single one). The permutations in the ShiftRows/MixColumns step are now “for free”, but the scalar multiplication in MixColumns still consumes another level in the modulus chain.

Using the same estimates as above, we expect the number of levels per round to be roughly four (as opposed to the 4.5 of the packed implementation). In practice, again over many parameter sets, we find the average number of levels consumed per round is between 4.5 and 5.0.

For the bit sliced implementation we represent the entire round function as a binary circuit, and we use 128 distinct ciphertexts (one per bit of the state matrix). However each set of 128 ciphertexts is able to represent a total of l distinct blocks. The main issue here is how to create a circuit for the round function which is as shallow, in terms of number of multiplication gates, as possible. Again the main issue is the SubBytes operation as all operations are essentially linear. To implement the SubBytes we used the “depth-16” circuit of Boyar and Peralta [3], which consumes four levels. The rest of the round function can be represented as a set of bit-additions, Thus, implementing this method means that we consumes a minimum of four levels on computing an entire round function. However, the extensive additions within the Boyar-Peralta circuit mean that we actually end up consuming a lot more. On average this translates into actually consuming between 5.0 and 10.0 levels per round.

4.3 Performance Details

As remarked in the introduction, we implemented the above variant of evaluating AES homomorphically on a very large memory machine; namely a machine with 256 GB of RAM. Firstly parameters were selected, as in Section 7, to cope with 60 levels of computation, and a public/private key pair was generated; along with the key-switching data for multiplication operations and conjugation with-respect-to the Galois group.

As input to the actual computation was an AES plaintext block and the eleven round keys; each of which was encrypted using our homomorphic encryption scheme. Thus the input consisted of eleven packed ciphertexts. Producing the encrypted key schedule took around half an hour. To evaluate the entire ten rounds of AES took just over 36 hours; however each of our ciphertexts could hold 864 plaintext slots of elements in

₂ ₈ , thus we could have processed 54 such AES blocks in this time period. This would result in a throughput of around forty minutes per AES block.

We note that as the algorithm progressed the operations became faster. The first round of the AES function took 7 hours, whereas the penultimate round took 2 hours and the last round took 30 minutes. Recall, the last AES round is somewhat simpler as it does not involve a MixColumns operation.

Whilst our other two implementation choices (given in Section 4.2 below) may seem to yield better amortized per-block timing, the increase in memory requirements and data actually makes them less attractive when encrypting a single block. For example just encrypting the key schedule in the Byte-Sliced variant takes just under 5 hours (with 50 levels), with an entire encryption taking 65 hours (12 hours for the first round, with between 4 and 5 hours for both the penultimate and final rounds). This however equates to an amortized time of just over five minutes per block.

The Bit-Sliced variant requires over 150 hours to just encrypt the key schedule (with 60 levels), and evaluating a single round takes so long that our program is timed out before even a single round is evaluated.

5 MORE DETAILS

Following [22, 5, 15, 27] we utilize rings defined by cyclotomic polynomial, A=Z[X]/Φ_(m)(X). We let A_(q) denote the set of elements of this ring reduced modulo various (possibly composite) moduli q. The ring A is the ring of integers of the m-th cyclotomic number field K.

5.1 Plaintext Slots

In an exemplary scheme, plaintexts will be elements of A₂, and the polynomial Φ_(m)(X) factors modulo 2 into l irreducible factors, Φ_(m)(X)=F₁(X)·F₂(X) . . . F_(l)(X)(mod 2), all of degree d=φ(m)/l. Just as in [5, 15, 27] each factor corresponds to a “plaintext slot”. That is, we view a polynomial aε

₂ as representing an l-vector (a mod F_(i))_(i−1) ^(t).

It is standard fact that the Galois group Gal=Gal(Q(ζ_(m))/Q) consists of the mappings κ_(k):a(X)

a(x^(k)) mod Φ_(m)(X) for all k co-prime with m, and that it is isomorphic to (Z/mZ)*. As noted in [15], for each i, jε{1, 2, . . . , l} there is an element κ_(k)εGal which sends an element in slot i to an element in slot j. Namely, if b=κ_(k)(a) then the element in the j'th slot of b is the same as that in the i'th slot of a. In addition Gal contains the Frobenius elements, X

X² ^(j) , which also act as Frobenius on the individual slots separately.

For the purpose of implementing AES we will be specifically interested in arithmetic in F₂ ₈ (represented as F₂ ₈ =F₂[X]/G(X) with G(X)=X⁸+X⁴+X³+X+1). We choose the parameters so that d is divisible by 8, so F₂ _(d) includes F₂ ₈ as a subfield. This lets us think of the plaintext space as containing l-vectors over F₂ ₈ .

5.2 Canonical Embedding Norm

Following [22], we use as the “size” of a polynomial aεA the l_(∞) norm of its canonical embedding. Recall that the canonical embedding of aεA into

^(φ(m)) is the φ(m)-vector of complex numbers σ(a) (a(ζ_(m) ^(i)))_(i) where ζ_(m) is a complex primitive m-th root of unity and the indexes i range over all of (Z/mZ)*. We call the norm of σ(a) the canonical embedding norm of a, and denote it by

∥a∥ _(x) ^(can)=∥σ(a)∥_(x).

We will make use of the following properties of ∥•∥_(x) ^(can):

-   -   For all a, bεA we have ∥a·b∥_(∞) ^(can)≦∥a∥_(∞) ^(can)·∥b∥_(∞)         ^(can).     -   For all aεA we have ∥a∥_(x) ^(can)≦∥a∥₁.     -   There is a ring constant c_(m) (depending only on m) such that         ∥a∥_(∞)≦c_(m)·∥a∥_(∞) ^(can) for all aεA.

The ring constant c_(m) is defined by c_(m)=∥CRT_(,) ⁻¹∥_(∞) where CRT_(m) is the CRT matrix for m, i.e. the Vandermonde matrix over the complex primitive m-th roots of unity. Asymptotically the value c_(m) can grow super-polynomially with m, but for the “small” values of m one would use in practice values of c, can be evaluated directly. See [11] for a discussion of c_(m).

Canonical Reduction

When working with elements in A_(q) for some integer modulus q, we sometimes need a version of the canonical embedding norm that plays nice with reduction modulo q. Following [15], we define the canonical embedding norm reduced modulo q of an element aεA as the smallest canonical embedding norm of any a′ which is congruent to a modulo q. We denote it as

${a}_{q}^{can}\overset{def}{=}{\min {\left\{ {{{{a^{\prime}}_{\infty}^{can}\text{:}\mspace{14mu} a^{\prime}} \in },{a^{\prime} \equiv {a\left( {{mod}\; q} \right)}}} \right\}.}}$

We sometimes also denote the polynomial where the minimum is obtained by [a]_(q) ^(can), and call it the canonical reduction of a modulo q. Neither the canonical embedding norm nor the canonical reduction is used in the scheme itself, it is only in the analysis of it that we will need them. We note that (trivially) we have |a|_(q) ^(can)≦∥a∥_(∞) ^(can).

5.3 Double CRT Representation

As noted in Section 2, we usually represent an element aΣA_(q) via double-CRT representation, with respect to both the polynomial factor of Φ_(m)(X) and the integer factors of q. Specifically, we assume that Z/qZ contains a primitive m-th root of unity (call it ζ), so Φ_(m)(X) factors modulo q to linear terms Φ_(m)(X)=Π_(iε(Z/mZ))·(X−ζ^(j))(mod q). We also denote q's prime factorization by q=Π_(i=0) ^(t)p_(i). Then a polynomial aΣA_(q) is represented as the (t+1)×φ(m) matrix of its evaluation at the roots of Φ_(m)(X) modulo p_(i) for i=0, . . . , t:

dble−CRT′(a)=(a(ζ^(j))mod p _(i))_(0≦i≦t,jε(Z/mZ))*.

The double CRT representation can be computed using t+1 invocations of the FFT algorithm modulo the p_(i)'s, picking only the FFT coefficients which correspond to elements in (Z/mZ)*. To invert this representation we invoke the inverse FFT algorithm t+1 times on a vector of length in consisting of the thinned out values padded with zeros, then apply the Chinese Remainder Theorem, and then reduce modulo Φ_(m)(X) and q.

Addition and multiplication in

_(q) can be computed as component-wise addition and multiplication of the entries in the two tables as follows (modulo the appropriate primes p_(i)),

dble−CRT′(a+b)=dble−CRT′(a)+dble−CRT′(b),

dble−CRT′(a·b)=dble−CRT′(a)·dble−CRT′(b).

Also, for an element of the Galois group κ_(k)ε

al (which maps a(X)ε

to a(X^(k))mod Φ_(m)(X)), we can evaluate κ_(k)(a) on the double-CRT representation of a just by permuting the columns in the matrix, sending each column j to column j·k mod m.

Turning to FIG. 10, a logic flow diagram is shown that illustrates the operation of an exemplary method, a result of execution of computer program instructions embodied on a computer readable memory, and/or functions performed by logic implemented in hardware, in accordance with exemplary embodiments of this invention. The operations in FIG. 10 are described herein, e.g., in reference to the instant section (Section 5.3) and to Section 3.2 above. The flow in FIG. 10 may be performed by the system 100 (see FIG. 1), e.g., by the one or more processors 104 and/or circuitry 102, e.g., in response to execution of the code 112 in program logic 110. The system 100 may be the search engine server 2, in an exemplary embodiment. In block 1010, the system 100 performs the operation of performing a homomorphic evaluation of a function on one or more input ciphertexts. The one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer. Each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p_(i) of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X). Performing the homomorphic evaluation of the function further comprises performing one or more operations using one or more matrices from one or more of the ciphertexts. See block 1010. In block 1020, the system 100 performs the operation of outputting one or more results of the one or more operations. Such output could be to a memory and/or a network.

The method of FIG. 10 may include where the one or more operations comprise homomorphic multiplication operations of two ciphertexts performed by entry-by-entry multiplication of matrices from the two ciphertexts. The method of FIG. 10 may also include where the one or more operations comprise automorphism of a ciphertext performed by permuting columns of the matrices from the ciphertext.

The method of FIG. 10 may further include where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(i) (where “smaller” in this context means smaller than q). Furthermore, each small prime p_(i), p_(i)−1 may be divisible by m, where m is an integer defining the m-th cyclotomic number field. Additionally, the one or more operations from block 1010 may comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext. Performing the modulus switching operation may comprise scaling down each element a(X) of the m'th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises:

setting ā(X) to be a coefficient representation of a(X) mod p_(t);

performing one of adding or subtracting p_(t) from every odd coefficient of ā(X), thereby obtaining a polynomial δ(X) with coefficients in (−p_(t), p_(t)];

computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p_(i) and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i), F_(j)(X));

subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and

dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).

As stated above, he method of FIG. 10 may further include where the plurality of moduli consist of products of small primes p_(i). Additionally, the one or more operations from block 1010 may comprise where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m-th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises:

setting ā(X) to be a coefficient representation of a(X) mod p_(t);

adding or subtracting multiplies of p_(t) to every coefficient of ā(X), thereby obtaining a polynomial δ(X) where all the coefficients of δ(X) are divisible by an integer r, where r is co-prime with p_(t);

computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p_(i) and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i), F_(j)(X));

subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and

dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).

5.4 Sampling From

_(q)

At various points we will need to sample from

_(q) with different distributions, as described below. We denote choosing the element aεA according to distribution

by a←

. The distributions below are described as over φ(m)-vectors, but we always consider them as distributions over the ring

, by identifying a polynomial aεA with its coefficient vector.

The uniform distribution

_(q): This is just the uniform distribution over (

/

)^(φ(m)), which we identify with (

∩(−q/2, q/2])^(φ(m))). Note that it is easy to sample from

_(q) directly in double-CRT representation.

The “discrete Gaussian”

_(q)(σ²): Let

(0,σ²) denote the normal (Gaussian) distribution on real numbers with zero-mean and variance σ², we use drawing from

(0, σ²) and rounding to the nearest integer as an approximation to the discrete Gaussian distribution. Namely, the distribution

_(q) _(t) (σ²) draws a real φ-vector according to

(0,σ²)^(φ(m)), rounds it to the nearest integer vector, and outputs that integer vector reduced modulo q (into the interval (−q/2,q/2]).

Sampling small polynomials,

(p) and

(h): These distributions produce vectors in {0,±1}^(φ(m)).

For a real parameter ρε[0,1],

(p) draws each entry in the vector from {0,±1}, with probability ρ/2 for each of −1 and +1, and probability of being zero 1−ρ.

For an integer parameter h≦φ(m), the distribution

(h) chooses a vector uniformly at random from {0,±1}^(φ(m)), subject to the conditions that it has exactly h nonzero entries.

5.5 Canonical embedding norm of random polynomials

In the coming sections we will need to bound the canonical embedding norm of polynomials that are produced by the distributions above, as well as products of such polynomials. In some cases it is possible to analyze the norm rigorously using Chernoff and Hoeffding bounds, but to set the parameters of our scheme we instead use a heuristic approach that yields better constants:

Let aε

be a polynomial that was chosen by one of the distributions above, hence all the (nonzero) coefficients in a are IID (independently identically distributed). For a complex primitive m-th root of unity ζ_(m), the evaluation a(ζ_(m)) is the inner product between the coefficient vector of a and the fixed vector z_(m)=(1, ζ_(m), ζ_(m) ², . . . ), which has Euclidean norm exactly √{square root over (φ(m))}. Hence the random variable a(ζ_(m)) has variance V=σ²φ(m), where σ² is the variance of each coefficient of a. Specifically, when a←

_(q) then each coefficient has variance q²/12, so we get variance V_(U)=q²φ(m)/12. When a←

_(q)(σ²) we get variance V_(G)≈σ²φ(m), and when a←

(ρ) we get variance V_(z)=ρφ(m). When choosing a←

(h) we get a variance of V_(H)=h (but not φ(m), since a has only h nonzero coefficients).

Moreover, the random variable a(ζ_(m)) is a sum of many HD random variables, hence by the law of large numbers it is distributed similarly to a complex Gaussian random variable of the specified variance. The mean of a(ζ_(m)) is zero, since the coefficients of a are chosen from a zero-mean distribution. We therefore use 6√{square root over (V)} (i.e. six standard deviations) as a high-probability bound on the size of a(ζ_(m)). Since the evaluation of a at all the roots of unity obeys the same bound, we use six standard deviations as our bound on the canonical embedding norm of a. (We chose six standard deviations since erfc (6)≈2⁻⁵⁵, which is good enough for us even when using the union bound and multiplying it by φ(m)≈2¹⁶.)

In many cases we need to bound the canonical embedding norm of a product of two such “random polynomials”. In this case our task is to bound the magnitude of the product of two random variables, both are distributed close to Gaussians, with variances σ_(a) ², σ_(b) ², respectively. For this case we use 16σ_(a)σ_(b) as our bound, since erfc (4)≈2⁻²⁵, so the probability that both variables exceed their standard deviation by more than a factor of four is roughly 2⁻⁵⁰.

6. The Basic Scheme

We now define our leveled HE scheme on L levels; including the Modulus-Switching and Key-Switching operations and the procedures for KeyGen, Enc, Dec, and for Add, Mult Scalar-Mult, and Automorphism.

Recall that a ciphertext vector c in the cryptosystem is a valid encryption of aε

with respect to secret keys and modulus q if [[

c,s

]_(q)]₂=a, where the inner product is over

=

[X]/Φ_(m)(X), the operation [•]_(q) denotes modular reduction in coefficient representation into the interval (−q/2, +q/2], and we require that the “noise” [

c,s

]_(q) is sufficiently small (in canonical embedding norm reduced mod q). In an exemplary implementation, a “normal” ciphertext is a 2-vector c=(c₀,c₁), and a “normal” secret key is of the form s=(1,−s), hence decryption takes the form

a←[c ₀ −c ₁ ·s] _(q) mod 2.  (2)

6.1 Our Moduli Chain

We define the chain of moduli for our depth-L homomorphic evaluation by choosing L “small primes” p₀, p₁, . . . , P_(L-1) and the t'th modulus in our chain is defined as q_(t)=Π_(j=0) ^(t)p_(j). (The sizes will be determined later.) The primes p_(i)'s are chosen so that for all i,

/p_(i)

contains a primitive m-th root of unity. Hence we can use our double-CRT representation for all

_(q) _(t) .

This choice of moduli makes it easy to get a level-(t−1) representation of aεA from its level-t representation. Specifically, given the level-t double-CRT representation dble-CRT′ (a) for some aεA_(q) _(t) , we can simply remove from the matrix the row corresponding to the last small prime p_(t), thus obtaining a level-(t−1) representation of a mod q_(t-1)ε

_(q) _(t-1) . Similarly we can get the double-CRT representation for lower levels by removing more rows. By a slight abuse of notation we write dble-CRT^(t′)(a)=dble-CRT′(a) mod q_(t′) for t′<t.

Recall that encryption produces ciphertext vectors valid with respect to the largest modulus q_(L-1) in our chain, and we obtain ciphertext vectors valid with respect to smaller moduli whenever we apply modulus-switching to decrease the noise magnitude. As described in Section 3.3, our implementation dynamically adjusts levels, performing modulus switching when the dynamically-computed noise estimate becomes too large. Hence each ciphertext in our scheme is tagged with both its level t (pinpointing the modulus qt relative to which this ciphertext is valid), and an estimate v on the noise magnitude in this ciphertext. In other words, a ciphertext is a triple (c,t,v) with 0≦t≦L−1, c a vector over

_(q) _(t) , and v a real number which is used as our noise estimate.

6.2 Modulus Switching

The operation SwitchModulus(c) takes the ciphertext c=((c₀,c₁)t,v) defined modulo q_(t) and produces a ciphertext c′=((c_(0′),c_(1′)),t−1,v′) defined modulo q_(t-1), Such that [c₀−s·c₁]_(q) _(t) ≡[c′₀−s·c′₁]_(q) _(t-1) (mod 2), and v′ is smaller than v. This procedure makes use of the function Scale(x,q,q′) that takes an element xε

_(q) and returns an element yε

_(q), such that in coefficient representation it holds that y≡x(mod 2), and y is the closest element to (q′/q)·x that satisfies this mod-2 condition.

To maintain the noise estimate, the procedure uses the pre-set ring-constant c_(m) (cf. Section 5.2) and also a pre-set constant B_(scale) which is meant to bound the magnitude of the added noise term from this operation. It works as shown in FIG. 5.

The constant B_(scale) is set as B_(scale)=2√{square root over (φ(m)/3)}·(8√{square root over (h)}+3), where h is the Hamming weight of the secret key. (In an exemplary embodiment, we use h=64, so we get B_(scale)≈77√{square root over (φ(m))}.) To justify this choice, we apply to the proof of the modulus switching lemma from [15, Lemma˜13] (in the full version), relative to the canonical embedding noun. In that proof it is shown that when the noise magnitude in the input ciphertext c=(c₀,c₁) is bounded by v, then the noise magnitude in the output vector c′=(c′₀,c′₁) is bounded by

${v^{\prime} = {{\frac{q_{t - 1}}{q_{t}} \cdot v} + {{\langle{s,\tau}\rangle}}_{\infty}^{can}}},$

provided that the last quantity is smaller than q_(t-1)/2.

Above τ is the “rounding error” vector, namely

$\tau \overset{def}{=}{\left( {\tau_{0},\tau_{1}} \right) = {\left( {c_{0}^{\prime},c_{1}^{\prime}} \right) - {\frac{q_{t - 1}}{q_{t}}{\left( {c_{0},c_{1}} \right).}}}}$

Heuristically assuming that τ behaves as if its coefficients are chosen uniformly in [−1, +1], the evaluation τ_(i)(ζ) at an m-th root of unity ζ_(m) is distributed close to a Gaussian complex with variance φ(m)/3. Also, s was drawn from HWT(h) so s(ζ_(m)) is distributed close to a Gaussian complex with variance h. Hence we expect τ₁(ζ)s(ζ) to have magnitude at most 16√{square root over (φ(m)/3·h)} (recall that we use h=64). We can similarly bound τ₀(ζ_(m)) by 6√{square root over (φ(m)/3)}, and therefore the evaluation of

s,r

at ζ_(m) is bounded in magnitude (whp) by:

16√{square root over (φ(m)/3·h)}6√{square root over (φ(m)/3)}=2√{square root over (φ(m)/3)}·(8√{square root over (h)}+3)≈77√{square root over (φ(m))}=B _(scale).  (3)

6.3 Key Switching

After some homomorphic evaluation operations we have on our hands not a “normal” ciphertext which is valid relative to “normal” secret key, but rather an “extended ciphertext” ((d₀, d₁, d₂), q_(t), v) which is valid with respect to an “extended secret key” s′=(1, −s, −s′). Namely, this ciphertext encrypts the plaintext aε

via

a=[[d ₀ −s·d ₁ −s′·d ₂]_(q) _(t) ]₂,

and the magnitude of the noise [d₀−s·d₁−d₂·s′]_(q) _(t) bounded by v. In our implementation, the component s is always the same element sε

that was drawn from

(h) during key generation, but s′ can vary depending on the operation. (See the description of multiplication and automorphisms below.)

To enable that translation, we use some “key switching matrices” that are included in the public key. (In an exemplary implementation these “matrices” have dimension 2×1, i.e., they consist of only two elements from

.) As explained in Section 3.1, we save on space and time by artificially “boosting” the modulus we use from q_(t) up to P·q_(t) for some “large” modulus P. We note that in order to represent elements in

_(pq) _(t) using our dble-CRT representation we need to choose P so that

/P

also has primitive m-th roots of unity. (In fact in one implementation we pick P to be a prime.)

The Key-Switching “Matrix”.

Denote by Q=P·q_(L-2) the largest modulus relative to which we need to generate key-switching matrices. To generate the key-switching matrix from s′=(1,−s,−s′) to s=(1,−s) (note that both key s share the same element s), we choose two element, one uniform and the other from our “discrete Gaussian”,

a _(s,s′)←

_(Q) and e _(s,s′)←

(σ²),

where the variance σ is a global parameter (that we later set as σ=3.2). The “key switching matrix” then consists of the single column vector

$\begin{matrix} {{{W\left\lbrack s^{\prime}\rightarrow s \right\rbrack} = \begin{pmatrix} b_{s,0^{\prime}} \\ a_{s,0^{\prime}} \end{pmatrix}},{{{where}\mspace{14mu} b_{s,0^{\prime}}}\overset{def}{=}{\left\lbrack {{s \cdot a_{s,0^{\prime}}} + {3e_{s,0^{\prime}}} + {P\; s^{\prime}}} \right\rbrack_{Q}.}}} & (4) \end{matrix}$

Note that W above is defined modulo Q=Pq_(L-2), but we need to use it relative to Q_(t)=Pq_(t) for whatever the current level t is. Hence before applying the key switching procedure at level t, we reduce W modulo Q_(t) to get

$W_{t}\overset{def}{=}{\lbrack W\rbrack_{Q_{t}}.}$

It is important to note that since Q_(t) divides Q then W_(t) is indeed a key-switching matrix. Namely it is of the form (b,a)^(T) with aε

_(Q) _(t) and b=[s·a+2e_(s,s′)+Ps′]_(Q) _(t) (with respect to the same element e_(s,s′)ε

from above).

The SwitchKey Procedure

Given the extended ciphertext c=((d₀,d₁,d₂),t,v) and the key-switching matrix W_(t)=(b,a)^(T), the procedure SwitchKey_(W) _(t) (c) proceeds as shown in FIG. 6. For simplicity we describe the SwitchKey procedure as if it always switches back to mod-q_(t), but in reality if the noise estimate is large enough then it can switch directly to q_(t-1) instead.

To argue correctness, observe that although the “actual key switching operation” from above looks superficially different from the standard key-switching operation c′←W·c, it is merely an optimization that takes advantage of the fact that both vectors s′ and s share the element s. Indeed, we have the equality over

_(Q) _(t) :

c _(0′) −s·c _(1′)=[(P·d ₀)+d ₂ ·b _(s,s′) −s·((P·d ₁)+d ₂ ·a _(s,′)),

=P·(d ₀ −s·d ₁ −s′d ₂)+2·d ₂·ε_(s,s′),

so as long as both sides are smaller than Q_(t) we have the same equality also over

(without the mod-Q_(t) reduction), which means that we get

[c _(0′) −s·c _(1′)]_(Q) _(t) =[P·(d ₀ −s·d ₁ −s′d ₂)+2·d ₂·ε_(0,0′)]_(Q) _(t) ≡[d ₀ −s·d ₁ −s′d ₂]_(Q) _(t) (mod 2).

To analyze the size of the added term 2d₂ε_(s,s′), we can assume heuristically that d₂ behaves like a uniform polynomial drawn from

_(q) _(t) , hence d₂(ζ_(m)) for a complex root of unity ζ_(m) is distributed close to a complex Gaussian with variance q_(t) ²φ(m)/12. Similarly ε_(s,s′)(ζ_(m)) is distributed close to a complex Gaussian with variance σ²φ(m), so 2d₂(ζ)ε(ζ) can be modeled as a product of two Gaussians, and we expect that with overwhelming probability it remains smaller than

${2 \cdot 16 \cdot \sqrt{q_{t}^{2}{{{\varphi (m)}/12} \cdot \sigma^{2}}{\varphi (m)}}} = {{\frac{16}{\sqrt{3}} \cdot \sigma}\; q_{t}{{\varphi (m)}.}}$

This yields a heuristic bound 16/√{square root over (3)}(m)·q_(t)=B_(Ks)·q_(t) on the canonical embedding norm of the added noise term, and if the total noise magnitude does not exceed a Q_(t)/2c_(m) then also in coefficient representation everything remains below Q_(t)/2. Thus our constant B_(Ks) is set as

$\begin{matrix} {{\frac{16\sigma \; {\varphi (m)}}{\sqrt{3}} \approx {9\; \sigma \; {\varphi (m)}}} = B_{Ks}} & (5) \end{matrix}$

Finally, dividing by P (which is the effect of the Scale operation), we obtain the final ciphertext that we require, and the noise magnitude is divided by P (except for the added B_(scale) term).

6.4 Key-Generation, Encryption, and Decryption

The procedures below depend on many parameters, h, σ, m, the primes p_(i) and P, etc. These parameters will be determined later.

KeyGen( ): Given the parameters, the key generation procedure chooses a low-weight secret key and then generates an LWE instance relative to that secret key. Namely, we choose

s←

(h),a←

_(q) _(L-1) , and e←

_(q) _(L-1) (σ²)

Then set the secret key as s and the public key as (a,b) where b=[a·s+2e]_(q) _(L-1) .

In addition, the key generation procedure adds to the public key some key-switching “matrices”, as described in Section 6.3. Specifically the matrix W[s²→s] for use in multiplication, and some matrices W[κ_(i)(s)→s] for use in automoiphisms, for κ_(i)ε

al whose indexes generates (

/m

)* (including in particular κ₂).

Enc_(pk)(m): To encrypt an element mε

₂, we choose one “small polynomial” (with 0, ±1 coefficients) and two Gaussian polynomials (with variance σ²),

v←

(0.5) and e ₀ ,e ₁←

_(q) _(L-1) (σ²).

Then we set c₀=b·v+2·e₀+m, c₁=a·v+2·e₁, and set the initial ciphertext as c′=(c₀, c₁, L−1, B_(clean)), where B_(clean) is a parameter that we determine below.

The noise magnitude in this ciphertext (B_(clean)) is a little larger than what we would like, so before we start computing on the ciphertext we do one modulus-switch. That is, the encryption procedure sets c←SwitchModulus(c′) and outputs c. We can deduce a value for B_(clean) as follows:

c₀ − s ⋅ c₁_(q_(t))^(can) ≤ c₀ − s ⋅ c₁_(∞)^(can) = ((a ⋅ s + 2 ⋅ e) ⋅ v + 2 ⋅ e₀ + m − (a ⋅ v + 2 ⋅ e₁) ⋅ s_(∞)^(can) = m + 2 ⋅ (e ⋅ v + e₀ − e₁ ⋅ s)_(∞)^(can) ≤ m_(∞)^(can) + 2 ⋅ (e ⋅ v_(∞)^(can) + e₀_(∞)^(can) + e₁ ⋅ s_(∞)^(can)).

Using our complex Gaussian heuristic from Section 5.5, we can bound the canonical embedding norm of the randomized terms above by

∥e·v∥ _(∞) ^(can)≦16σφ(m)/√{square root over (2)},∥e ₀∥_(∞) ^(can)≦6σ√{square root over (φ(m))},∥e ₁ ·s∥ _(∞) ^(can)≦16σ√{square root over (h·φ(m))}.

Also, the norm of the input message m is clearly bounded by φ(m), hence (when we substitute our parameters h=64 and σ=3.2) we get the bound

φ(m)+32σφ(m)/√{square root over (2)}+12σ√{square root over (φ(m))}+32σ√{square root over (h·φ(m))}≈74φ(m)+858√{square root over (φ(m))}=B _(clean)  (6)

Our goal in the initial modulus switching from q_(L-1) to q_(L-2) is to reduce the noise from its initial level of B_(clean)=Θ(φ(m)) to our base-line bound of B=Θ(√{square root over (φ(m))}) which is determined in Equation (12) below.

Dec_(pk)(c): Decryption of a ciphertext (c₀,c₁,t,v) at level t is performed by setting m′←[c₀−s·c₁]_(q) _(t) , then converting m′ to coefficient representation and outputting m′ mod 2. This procedure works when c_(m)·v<q_(t)/2, so this procedure only applies when the constant c_(m) for the field

is known and relatively small (which as we mentioned above will be true for all practical parameters). Also, we must pick the smallest prime q₀=p₀ large enough, as described in Section 7.2.

6.5 Homomorphic Operations

Add(c,c′): Given two ciphertexts c=((c₀, c₁),t,v) and c′=((c_(0′),c_(1′)),t′,v′), representing messages m, m′ε

₂, this algorithm forms a ciphertext c_(a)=((a₀,a₁),t_(a),v_(a)) which encrypts the message m_(a)=m+m′.

If the two ciphertexts do not belong to the same level then we reduce the larger one modulo the smaller of the two moduli, thus bringing them to the same level. (This simple modular reduction works as long as the noise magnitude is smaller than the smaller of the two moduli, if this condition does not hold then we need to do modulus switching rather than simple modular reduction.) Once the two ciphertexts are at the same level (call it t″), we just add the two ciphertext vectors and two noise estimates to get

c _(a)=(([c ₀ +c′ ₀]_(q) _(t″) ,[c ₁ +c _(1′)]_(q) _(t″) ),t″,v+v′).

Mult(c,c′): Given two ciphertexts representing messages m, m′ε

₂, this algorithm forms a ciphertext encrypts the message mm′.

We begin by ensuring that the noise magnitude in both ciphertexts is smaller than the pre-set constant B (which is our base-line bound and is determined in Equation (12) below), performing modulus-switching as needed to ensure this condition. Then we bring both ciphertexts to the same level by reducing modulo the smaller of the two moduli (if needed). Once both ciphertexts have small noise magnitude and the same level we form the extended ciphertext (essentially performing the tensor product of the two) and apply key-switching to get back a normal ciphertext. A pseudo-code description of this multiplication procedure is shown in FIG. 7.

We stress that the only place where we force modulus switching is before the multiplication operation. In all other operations we allow the noise to grow, and it will be reduced back the first time it is input to a multiplication operation. We also note that we may need to apply modulus switching more than once before the noise is small enough.

Scalar-Mult(c,α): Given a ciphertext c=(c₀,c₁,t,v) representing the message m, and an element αε

, (represented as a polynomial modulo 2 with coefficients in {−1,0,1}), this algorithm forms a ciphertext c_(m)=(a₀,a₁,t_(m),v_(m)) which encrypts the message m_(m)=α·m. This procedure is needed in our implementation of homomorphic AES, and is of more general interest in general computation over finite fields.

The algorithm makes use of a procedure Randomize (α) which takes α and replaces each non-zero coefficients with a coefficient chosen at random from {−1,1}. To multiply by α, we set β←Randomize (α) and then just multiply both co and c₁ by β. Using the same argument as we used in Appendix 5.5 for the distribution

(h), here too we can bound the norm of β by ∥β∥_(∞) ^(can)≦6√{square root over (Wt(α))} where Wt(α) is the number of nonzero coefficients of α. Hence we multiply the noise estimate by 6√{square root over (Wt(α))}, and output the resulting ciphertext c_(m)=(c₀,·β, c₁·β, t, v·6√{square root over (Wt(α))})

Automorphism(c,κ): In the main body we explained how permutations on the plaintext slots can be realized via using elements κε

al; we also require the application of such automorphism to implement the Frobenius maps in our AES implementation.

For each κ that we want to use, we need to include in the public key the “matrix” W[κ(s)→s]. Then, given a ciphertext c=(c₀, c₁,t,v) representing the message m, the function Automorphism(c,κ) produces a ciphertext c′=(c₀,c₁,t,v′) which represents the message κ(m) We first set an “extended ciphertext” by setting

d ₀=κ(c ₀),d ₁←0, and d ₂←κ(c ₁)

and then apply key switching to the extended ciphertext ((d₀,d₁, d₂),t,v) using the “matrix” W[κ(s)→s].

7 Security Analysis and Parameter Settings

Below we derive the concrete parameters for use in our implementation. We begin in Section 7.1 by deriving a lower-bound on the dimension N of the LWE problem underlying our key-switching matrices, as a function of the modulus and the noise-variance. (This will serve as a lower-bound on φ(m) for our choice of the ring polynomial Φ_(m)(X).) Then in Section 7.2 we derive a lower bound on the size of the largest modulus Q in our implementation, in terms of the noise variance and the dimension N. Then in Section 7.3 we choose a value for the noise variance (as small as possible subject to some nominal security concerns), solve the somewhat circular constraints on N and Q, and set all the other parameters.

7.1 Lower-Bounding the Dimension

Below we apply to the LWE-security analysis of Lindner and Peikert [20], together with a few (arguably justifiable) assumptions, to analyze the dimension needed for different security levels. The analysis below assumes that we are given the modulus Q and noise variance σ² for the LWE problem (i.e., the noise is chosen from a discrete Gaussian distribution modulo Q with variance σ² in each coordinate). The goal is to derive a lower-bound on the dimension N required to get any given security level. The first assumption that we make, of course, is that the Lindner-Peikert analysis—which was done in the context of standard LWE—applies also for our ring-LWE case. We also make the following extra assumptions:

1) We assume that (once a is not too tiny), the security depends on the ratio Q/σ and not on Q and σ separately. Nearly all the attacks and hardness results in the literature support this assumption, with the exception of the Arora-Ge attack [2] (that works whenever σ is very small, regardless of Q).

2) The analysis in [20] devised an experimental formula for the time that it takes to get a particular quality of reduced basis (i.e., the parameter δ of Gama and Nguyen [12]), then provided another formula for the advantage that the attack can derive from a reduced basis at a given quality, and finally used a computer program to solve these formulas for some given values of N and δ. This provides some time/advantage tradeoff, since obtaining a smaller value of δ (i.e., higher-quality basis) takes longer time and provides better advantage for the attacker.

For our purposes we made the assumption that the best runtime/advantage ratio is achieved in the high-advantage regime. Namely we should spend basically all the attack running time doing lattice reduction, in order to get a good enough basis that will break security with advantage (say) ½. This assumption is consistent with the results that are reported in [20].

3) Finally, we assume that to get advantage of close to ½ for an LWE instance with modulus Q and noise σ, we need to be able to reduce the basis well enough until the shortest vector is of size roughly Q/σ. Again, this is consistent with the results that are reported in [20].

Given these assumptions and the formulas from [20], we can now solve the dimension/security tradeoff analytically. Because of the first assumption we might as well simplify the equations and derive our lower bound on N for the case σ=1, where the ratio Q/σ is equal to Q. (In reality we will use σ≈4 and increase the modulus by the same 2 bits).

Following Gama-Nguyen[12], recall that a reduced basis B=(b₁|b₂| . . . |b_(M)) for a dimension-M, determinant-D lattice (with ∥b₁∥≦∥b₂∥≦ . . . ∥b_(M)∥), has quality parameter δ if the shortest vector in that basis has norm ∥b₁∥=δ^(M)·D^(1/M). In other words, the quality of B is defined as =∥b₁∥^(1/M)D^(1/M) ² . The time (in seconds) that it takes to compute a reduced basis of quality δ for a random LWE instance was estimated in [20] to be at least

log(time)≧1.8/log(δ)−110.  (7)

For a random Q-ary lattice of rank N, the determinant is exactly Q^(N) whp, and therefore a quality-δ basis has ∥b₁∥=δ^(M)·Q^(N/M). By our second assumption, we should reduce the basis enough so that ∥b₁∥=Q, so we need Q=δ^(M)·Q^(M/N). The LWE attacker gets to choose the dimension M, and the best choice for this attack is obtained when the right-hand-side of the last equality is minimized, namely for M=√{square root over (N·log Q/log δ)}. This yields the condition

log Q=log(δ^(M) Q ^(N/M))=M log δ+(N/M)log Q=2√{square root over (N log Q log δ)},

which we can solve for N to get N=log Q/4 log δ. Finally, we can use Equation (7) to express log δ as a function of log(time), thus getting N=log Q·(log(time)+110)/7.2. Recalling that in our case we used σ=1 (so Q/σ=Q), we get our lower-bound on N in terms of Q/σ. Namely, to ensure a time/advantage ratio of at least 10^(k), we need to set the rank N to be at least

$\begin{matrix} {N \geq \frac{{\log \left( {Q\text{/}\sigma} \right)}\left( {k + 110} \right)}{7.2}} & (8) \end{matrix}$

For example, the above formula says that to get 80-bit security level we need to set N≧log(Q/σ)·26.4, for 100-bit security level we need N≧log(Q/σ)·29.1, and for 128-bit security level we need N≧log(Q/σ)·33.1 We comment that these values are indeed consistent with the values reported in [20].

7.1.1 LWE with Sparse Key

The analysis above applies to “generic” LWE instance, but in our case we use very sparse secret keys (with only h=64 nonzero coefficients, all chosen as ±1). This brings up the question of whether one can get better attacks against LWE instances with a very sparse secret (much smaller than even the noise). We note that Goldwasser et al. proved in [16] that LWE with low-entropy secret is as hard as standard LWE with weaker parameters (for large enough moduli). Although the specific parameters from that proof do not apply to our choice of parameter, it does indicate that weak-secret LWE is not “fundamentally weaker” than standard LWE. In terms of attacks, the only attack that we could find that takes advantage of this sparse key is by applying the reduction technique of Applebaum et al. [1] to switch the key with part of the error vector, thus getting a smaller LWE error.

In a sparse-secret LWE we are given a random N-by-M matrix A (modulo Q), and also an M-vector y=[sA+e]_(Q). Here the N-vector s is our very sparse secret, and e is the error M-vector (which is also short, but not sparse and not as short as s).

Below let A₁ denotes the first N columns of A, A₂ the next N columns, then A₃, A₄, etc. Similarly e₁, e₂, . . . are the corresponding parts of the error vector and y₁, y₂, . . . the corresponding parts of y. Assuming that A₁ is invertible (which happens with high probability), we can transform this into an LWE instance with respect to secret e₁, as follows:

We have y₁=sA₁+e₁, or alternatively A₁ ⁻¹y₁=s+A₁ ⁻¹e₁. Also, for i>1 we have y_(i)=sA_(i)+e_(i), which together with the above gives A_(i)A₁ ⁻¹y_(i)=A_(i)A₁ ⁻¹e₁−e_(i). Hence if we denote

${B_{1}\overset{def}{=}A_{1}^{- 1}},{{{{and}\mspace{14mu} {for}\mspace{14mu} i} > {1\mspace{14mu} B_{i}}}\overset{def}{=}{A_{i}A\; 1^{- 1}}},{{and}\mspace{14mu} {similarly}}$ ${z_{1}\overset{def}{=}{A_{1}^{- 1}y_{1}}},{{{{and}\mspace{14mu} {for}\mspace{14mu} i} > {1\mspace{14mu} z_{i}}}\overset{def}{=}{A_{i}A_{1}^{- 1}y_{i}}},$

and then set

${B\overset{def}{=}{{\left( B_{1}^{t} \middle| B_{2}^{t} \middle| B_{3}^{t} \middle| \ldots \right)\mspace{14mu} {and}\mspace{14mu} z}\overset{def}{=}\left( z_{1} \middle| z_{2} \middle| z_{3} \middle| \ldots \right)}},$

and also f=(s|e₂|e₃| . . . ) then we get the LWE instance

z=e ₁ ^(t) B+f

with secret e₁ ^(t). The thing that makes this LWE instance potentially easier than the original one is that the first part of the error vector f is our sparse/small vector s, so the transformed instance has smaller error than the original (which means that it is easier to solve).

Trying to quantify the effect of this attack, we note that the optimal M value in the attack from Section 7.1 above is obtained at M=2N, which means that the new error vector is f=(s|e₂), which has Euclidean norm smaller than e=(e₁|e₂) by roughly a factor of √{square root over (2)}, (assuming that ∥s∥<<∥e₁∥≈∥e₂∥). Maybe some further improvement can be obtained by using a smaller value for M, where the shorter error may outweigh the “non optimal” value of M. However, we do not expect to get major improvement this way, so it seems that the very sparse secret should only add maybe one bit to the modulus/noise ratio.

7.2 The Modulus Size

In this section we assume that we are given the parameter N=φ(m) (for our polynomial ring modulo Φ_(m)(X)). We also assume that we are given the noise variance σ², the number of levels in the modulus chain L, an additional “slackness parameter” ξ (whose purpose is explained below), and the number of nonzero coefficients in the secret key h. Our goal is to devise a lower bound on the size of the largest modulus Q used in the public key, so as to maintain the functionality of the scheme.

Controlling the Noise

Driving the analysis in this section is a bound on the noise magnitude right after modulus switching, which we denote below by B. We set our parameters so that starting from ciphertexts with noise magnitude B, we can perform one level of fan-in-two multiplications, then one level of fan-in-ξ additions, followed by key switching and modulus switching again, and get the noise magnitude back to the same B.

Recall that in the “reduced canonical embedding norm”, the noise magnitude is at most multiplied by modular multiplication and added by modular addition, hence after the multiplication and addition levels the noise magnitude grows from B to as much as ξB².

As seen in Section 6.3, performing key switching scales up the noise magnitude by a factor of P and adds another noise term of magnitude up to B_(Ks)·q_(t) (before doing modulus switching to scale the noise back down). Hence starting from noise magnitude ξB², the noise grows to magnitude PξB²+B_(Ks)·q_(t) (relative to the modulus Pq_(t)).

Below we assume that after key-switching we do modulus switching directly to a smaller modulus.

After key-switching we can switch to the next modulus q_(t-1) to decrease the noise back to our bound B. Following the analysis from Section 6.2, switching moduli from Q_(t) to q_(t-1) decreases the noise magnitude by a factor of q_(t-1)/Q_(t)=1/(P·p_(t)), and then add a noise term of magnitude B_(scale).

Starting from noise magnitude PξB²+B_(Ks)·q_(t) before modulus switching, the noise magnitude after modulus switching is therefore bounded whp by

${\frac{{{P \cdot \xi}\; B^{2}} + {B_{Ks} \cdot q_{t}}}{P \cdot p_{t}} + B_{scale}} = {\frac{\xi \; B^{2}}{p_{t}} + \frac{B_{Ks} \cdot q_{t - 1}}{P} + B_{scale}}$

Using the analysis above, our goal next is to set the parameters B, P and the p_(t)'s (as functions of N, σ, L, ξ and h) so that in every level t we get

${\frac{\xi \; B^{2}}{p_{t}} + \frac{B_{Ks} \cdot q_{t - 1}}{P} + B_{scale}} \leq {B.}$

Namely we need to satisfy at every level t the quadratic inequality (in B)

$\begin{matrix} {{{\frac{\xi}{p_{t}}B^{2}} - B + \underset{\underset{{denote}\mspace{14mu} {this}\mspace{14mu} {by}\mspace{11mu} R_{t - 1}}{}}{\left( {\frac{B_{Ks} \cdot q_{t - 1}}{P} + B_{scale}} \right)}} \leq 0.} & (9) \end{matrix}$

Observe that (assuming that all the primes p, are roughly the same size), it suffices to satisfy this inequality for the largest modulus t=L−2, since R_(t-1) increases with larger t's. Noting that R_(L-3)>B_(scale), we want to get this term to be as close to B_(scale) as possible, which we can do by setting P large enough. Specifically, to make it as close as R_(L-3)=(1+2^(−n))B_(scale) it is sufficient to set

$\begin{matrix} {{P \approx {2^{n}\frac{B_{Ks} \cdot q_{t - 1}}{P}} \approx {2^{n}\frac{9\; \sigma \; {Nq}_{L - 3}}{77\sqrt{N}}} \approx {2^{n - 3}{q_{L - 3} \cdot \sigma}\sqrt{N}}},} & (10) \end{matrix}$

Below we set (say) n=8, which makes it close enough to use just R_(L-3)≈B_(scale) for the derivation below.

Clearly to satisfy Inequality (9) we must have a positive discriminant, which means

${{1 - {4\frac{\xi}{p_{L - 3}}R_{L - 3}}} \geq 0},$

or p_(L-2)≧4ξR_(L-3). Using the value R_(L-3)≈B_(scale), this translates into setting

p ₁ ≈p ₂ . . . ≈p _(L-2)≈4ξ·B _(scale)≈308ξ√{square root over (N)}  (11)

Finally, with the discriminant positive and all the p_(i)'s roughly the same size we can satisfy Inequality (9) by setting

$\begin{matrix} {{B \approx \frac{1}{2\xi \text{/}p_{L - 2}}} = {\frac{p_{L - 2}}{2\xi} \approx {2B_{scale}} \approx {154{\sqrt{N}.}}}} & (12) \end{matrix}$

The Smallest Modulus

After evaluating our L-level circuit, we arrive at the last modulus q₀=p₀ with noise bounded by ξB². To be able to decrypt, we need this noise to be smaller than q₀/2c_(m), where c_(m) is the ring constant for our polynomial ring modulo Φ_(m)(X). For our setting, that constant is always below 40, so a sufficient condition for being able to decrypt is to set

q ₀ =p ₀≈80ξB ²≈2^(20.9) ξN  (13)

The Encryption Modulus

Recall that freshly encrypted ciphertext have noise B_(clean) (as defined in Equation (6)), which is larger than our baseline bound B from above. To reduce the noise magnitude after the first modulus switching down to B, we therefore set the ratio p_(L-1)=q_(L-1)/q_(L-2) so that B_(clean)/p_(L-1)+B_(scale)≦B. This means that we set

$\begin{matrix} {p_{L - 1} = {\frac{B_{clean}}{B - B_{scale}} \approx \frac{{74N} + {858\sqrt{N}}}{77\sqrt{N}} \approx {\sqrt{N} + 11}}} & (14) \end{matrix}$

The Largest Modulus

Having set all the parameters, we are now ready to calculate the resulting bound on the largest modulus, namely Q_(L-2)=q_(L-2)·P. Using Equations (11), and (13), we get

$\begin{matrix} {q_{t} = {{{p_{0} \cdot {\prod\limits_{t = 1}^{t}\; p_{i}}} \approx {\left( {2^{20.9}\xi \; N} \right) \cdot \left( {308\; \xi \sqrt{N}} \right)^{t}}} = {2^{20.9} \cdot 308^{t} \cdot \xi^{t + 1} \cdot {N^{{t/2} + 1}.}}}} & (15) \end{matrix}$

Now using Equation (10) we have

$P \approx {2^{5}q_{L - 3}\sigma \sqrt{N}} \approx {{2^{25.9} \cdot 308^{L - 3} \cdot \xi^{L - 2} \cdot N^{{{({L - 3})}/2} + 1} \cdot \sigma}\sqrt{N}} \approx {{2 \cdot 308^{L} \cdot \xi^{L - 2}}\sigma \; N^{L/2}}$

and finally

$\begin{matrix} {Q_{L - 2} = {{P \cdot q_{L - 2}} \approx {\left( {{2 \cdot 308^{L} \cdot \xi^{L - 2}}\sigma \; N^{L - 2}} \right) \cdot \left( {2^{20.9} \cdot 308^{L - 2} \cdot \xi^{L - 1} \cdot N^{L/2}} \right)} \approx {\sigma \cdot 2^{{16.5L} + {5A}} \cdot \xi^{{2L} - 3} \cdot N^{L}}}} & (16) \end{matrix}$

7.3 Putting it Together

We now have in Equation (8) a lower bound on N in terms of Q, σ and the security level k, and in Equation (16) a lower bound on Q with respect to N, σ and several other parameters. We note that σ is a free parameter, since it drops out when substituting Equation (16) in Equation (8). In our implementation we used σ=3.2, which is the smallest value consistent with the analysis in [23].

For the other parameters, we set ξ=8 (to get a small “wiggle room” without increasing the parameters much), and set the number of nonzero coefficients in the secret key at h=64 (which is already included in the formulas from above, and should easily defeat exhaustive-search/birthday type of attacks). Substituting these values into the equations above we get

p ₀≈2^(23.9) N,p _(i)≈2^(11.3) √{square root over (N)} for i=1, . . . ,L−2

P≈2^(11.3L-5) N ^(L/2), and Q _(L-2)≈2^(22.5L-3.6) σN ^(L).

Substituting the last value of Q_(L-2) into Equation (8) yields

$\begin{matrix} {N > \frac{\left( {{L\left( {{\log \; N} + 23} \right)} - 8.5} \right)\left( {k + 110} \right)}{7.2}} & (17) \end{matrix}$

Targeting k=80-bits of security and solving for several different depth parameters L, we get the results in the table of FIG. 8, which also lists approximate sizes for the primes p_(i) and P.

Choosing Concrete Values

Having obtained lower-bounds on N=φ(m) and other parameters, we now need to fix precise cyclotomic fields

(ζ_(m)) to support the algebraic operations we need. We have two situations we will be interested in for our experiments. The first corresponds to performing arithmetic on bytes in

₂ ₈ (i.e. n=8), whereas the latter corresponds to arithmetic on bits in

₂ (i.e. n=1). See FIG. 9. We therefore need to find an odd value of m, with φ(m)≈N and m dividing 2^(d)−1, where we require that d is divisible by n. Values of in with a small number of prime factors are preferred as they give rise to smaller values of c_(m). We also look for parameters which maximize the number of slots l we can deal with in one go, and values for which φ(m) is close to the approximate value for N estimated above. When n=1 we always select a set of parameters for which the l value is at least as large as that obtained when n=8.

8 Scale(c,q_(t),q_(t-1)) in dble-CRT Representation

Let q_(i)=Π_(j=0) ^(i)p_(j), where the p_(j)'s are primes that split completely in our cyclotomic field

. We are given a cε

_(q) _(t) represented via double-CRT—that is, it is represented as a “matrix” of its evaluations at the primitive m-th roots of unity modulo the primes p₀, . . . , p_(t). We want to modulus switch to q_(t-1)—i.e., scale down by a factor of p_(t). Let's recall what this means: we want to output c′ε

, represented via double-CRT format (as its matrix of evaluations modulo the primes p₀, . . . , p_(t-1)), such that

1. c′=c mod 2.

2. c′ is very close (in terms of its coefficient vector) to c/p_(t).

Above, we explained how this could be performed in dble-CRT representation. This made explicit use of the fact that the two ciphertexts need to be equivalent modulo two. If we wished to replace two with a general prime p, then things are a bit more complicated. For completeness, although it is not required in our scheme, we present a methodology below. In this case, the conditions on c^(†) are as follows:

1. c^(†)=c·p_(t) mod p.

2. c^(†) is very close to c.

3. c^(†) is divisible by p_(t).

As before, we set c′←c^(†)/p_(t). (Note that for p=2, we trivially have c·p_(t)=c mod p, since p_(t) will be odd.)

This causes some complications, because we set c^(†)←c+δ, where δ=−c mod p_(t) (as before) but now δ=(p_(t)−1)·c mod p. To compute such a δ, we need to know c mod p. Unfortunately, we don't have c mod p. One not-very-satisfying way of dealing with this problem is the following. Set ĉ<[p_(t)]_(p), c mod q_(t). Now, if c encrypted m, then ĉ encrypts [p_(t)]_(p)·m, and ĉ's noise is [p_(t)]_(p)<p/2 times as large. It is obviously easy to compute ĉ's double-CRT format from c's. Now, we set c^(†) so that the following is true:

1. c^(†)=ĉ mod p.

2. c^(†) is very close to ĉ.

3. c^(†) is divisible by p_(t).

This is easy to do. The algorithm to output c^(†) in double-CRT format is as follows:

1. Set c to be the coefficient representation of a ĉ mod p_(t). (Computing this requires a single “small FFT” modulo the prime p_(t).)

2. Set δ to be the polynomial with coefficients in (−p_(t)·p/2, p_(t)·p/2] such that δ=0 mod p and δ=−c mod p_(t).

3. Set c^(†)=ĉ+δ, and output c^(†)'s double-CRT representation.

-   -   (a) We already have ĉ's double-CRT representation.     -   (b) Computing δ's double-CRT representation requires t “small         FFT” modulo the p_(j)'s.

9 Other Optimizations

Some other optimizations that we encountered during our implementation work are discussed next. Not all of these optimizations are useful for our current implementation, but they may be useful in other contexts.

Three-Way Multiplications

Sometime we need to multiply several ciphertexts together, and if their number is not a power of two then we do not have a complete binary tree of multiplications, which means that at some point in the process we will have three ciphertexts that we need to multiply together.

The standard way of implementing this 3-way multiplication is via two 2-argument multiplications, e.g., x·(y·z). But it turns out that here it is better to use “raw multiplication” to multiply these three ciphertexts (as done in [7]), thus getting an “extended” ciphertext with four elements, then apply key-switching (and later modulus switching) to this ciphertext. This takes only six ring-multiplication operations (as opposed to eight according to the standard approach), three modulus switching (as opposed to four), and only one key switching (applied to this 4-element ciphertext) rather than two (which are applied to 3-element extended ciphertexts). All in all, this three-way multiplication takes roughly 1.5 times a standard two-element multiplication.

We stress that this technique is not useful for larger products, since for more than three multiplicands the noise begins to grow too large. But with only three multiplicands we get noise of roughly B³ after the multiplication, which can be reduced to noise≈B by dropping two levels, and this is also what we get by using two standard two-element multiplications.

Commuting Automorphisms and Multiplications.

Recalling that the automorphisms X

X^(i) commute with the arithmetic operations, we note that some ordering of these operations can sometimes be better than others. For example, it may be better perform the multiplication-by-constant before the automorphism operation whenever possible. The reason is that if we perform the multiply-by-constant after the key-switching that follows the automorphism, then added noise term due to that key-switching is multiplied by the same constant, thereby making the noise slightly larger. We note that to move the multiplication-by-constant before the automorphism, we need to multiply by a different constant.

Switching to Higher-Level Moduli.

We note that it may be better to perform automorphisms at a higher level, in order to make the added noise term due to key-switching small with respect to the modulus. On the other hand operations at high levels are more expensive than the same operations at a lower level. A good rule of thumb is to perform the automorphism operations one level above the lowest one. Namely, if the naive strategy that never switches to higher-level moduli would perform some Frobenius operation at level q_(i), then we perform the key-switching following this Frobenius operation at level Q_(i+1), and then switch back to level q_(i+1) (rather than using Q_(i) and q_(i)).

Commuting Addition and Modulus-Switching.

When we need to add many terms that were obtained from earlier operations (and their subsequent key-switching), it may be better to first add all of these terms relative to the large modulus Q_(i) before switching the sum down to the smaller q_(i) (as opposed to switching all the terms individually to q_(i) and then adding).

Reducing the Number of Key-Switching Matrices.

When using many different automorphisms: κ_(i):X

X^(i) we need to keep many different key-switching matrices in the public key, one for every value of i that we use. We can reduces this memory requirement, at the expense of taking longer to perform the automorphisms. We use the fact that the Galois group

al that contains all the maps κ_(i) (which is isomorphic to (

/m

)*) is generated by a relatively small number of generators. (Specifically, for our choice of parameters the group (

/m

)* has two or three generators.) It is therefore enough to store in the public key only the key-switching matrices corresponding to κ_(s) _(j) 's for these generators g_(j) of the group

al. Then in order to apply a map κ_(i) we express it as a product of the generators and apply these generators to get the effect of κ_(i). (For example, if i=g₁ ²·g₂ then we need to apply κ_(g) ₁ twice followed by a single application of κ_(g) ₂ .)

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Acronyms that appear in the text or drawings are defined as follows.

-   -   AES Advanced Encryption Standard     -   BGV Brakerski, Gentry, and Vaikuntanathan     -   CRT Chinese Remainder Theorem     -   FFT Fast Fourier Transform     -   FHE Fully Homomorphic Encryption     -   GMT GNU Multiple Precision Arithmetic Library     -   HE Homomorphic Encryption     -   LWE Learning With Error     -   NTL Number Theory Library     -   SIMD Single Instruction, Multiple Data     -   whp with high probability

REFERENCES

-   [1] Benny Applebaum, David Cash, Chris Peikert, and Amit Sahai. Fast     cryptographic primitives and circular-secure encryption based on     hard learning problems. In CRYPTO, volume 5677 of Lecture Notes in     Computer Science, pages 595-618. Springer, 2009. -   [2] Sanjeev Arora and Rong Ge. New algorithms for learning in the     presence of errors. In ICALP, volume 6755 of Lecture Notes in     Computer Science, pages 403-415. Springer, 2011. -   [3] Joan Boyar and Rene Peralta. A depth-16 circuit for the AES     S-box. Manuscript, eprint.iacr.org/2011/332, 2011. -   [4] Zvika Brakerski. Fully homomorphic encryption without modulus     switching from classical GapSVP. Manuscript,     eprint.iacr.org/2012/078, 2012. -   [5] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. Fully     homomorphic encryption without bootstrapping. In Innovations in     Theoretical Computer Science (ITCS'12), 2012. Available at     eprint.iacr.org/2011/277. -   [6] Zvika Brakerski and Vinod Vaikuntanathan. Efficient fully     homomorphic encryption from (standard) LWE. In FOCS'11. IEEE     Computer Society, 2011. -   [7] Zvika Brakerski and Vinod Vaikuntanathan. Fully homomorphic     encryption from ring-LWE and security for key dependent messages. In     Advances in Cryptology—CRYPTO 2011, volume 6841 of Lecture Notes in     Computer Science, pages 505-524. Springer, 2011. -   [8] Jean-Sebastien Coron, Avradip Mandal, David Naccache, and Mehdi     Tibouchi. Fully homomorphic encryption over the integers with     shorter public keys. In Advances in Cryptology—CRYPTO 2011, volume     6841 of Lecture Notes in Computer Science, pages 487-504. Springer,     2011. -   [9] Jean-Sebastien Coron, David Naccache, and Mehdi Tibouchi. Public     key compression and modulus switching for fully homomorphic     encryption over the integers. In Advances in Cryptology—EUROCRYPT     2012, volume 7237 of Lecture Notes in Computer Science, pages     446-464. Springer, 2012. -   [10] Ivan Damg{dot over (a)}rd and Marcel Keller. Secure multiparty     aes. In Proc. of Financial Cryptography 2010, volume 6052 of LNCS,     pages 367-374, 2010. -   [11] Ivan Damgard, Valerio Pastro, Nigel P. Smart, and Sarah     Zakarias. Multiparty computation from somewhat homomorphic     encryption. Manuscript, 2011. -   [12] Nicolas Gama and Phong Q. Nguyen. Predicting lattice reduction.     In EUROCRYPT, volume 4965 of Lecture Notes in Computer Science,     pages 31-51. Springer, 2008. -   [13] Craig Gentry. Fully homomorphic encryption using ideal     lattices. In Michael Mitzenmacher, editor, STOC, pages 169-178. ACM,     2009. -   [14] Craig Gentry and Shai Halevi. Implementing gentry's     fully-homomorphic encryption scheme. In EUROCRYPT, volume 6632 of     Lecture Notes in Computer Science, pages 129-148. Springer, 2011. -   [15] Craig Gentry, Shai Halevi, and Nigel Smart. Fully homomorphic     encryption with polylog overhead. In EUROCRYPT, volume 7237 of     Lecture Notes in Computer Science, pages 465-482. Springer, 2012.     Full version at eprint.iacr.org/2011/566. -   [16] Shaft Goldwasser, Yael Tauman Kalai, Chris Peikert, and Vinod     Vaikuntanathan. Robustness of the learning with errors assumption.     In Innovations in Computer Science—ICS'10, pages 230-240. Tsinghua     University Press, 2010. -   [17] Yan Huang, David Evans, Jonathan Katz, and Liar Malka. Faster     secure two-party computation using garbled circuits. In USENIX     Security Symposium, 2011. -   [18] C. Orlandi J. B. Nielsen, P. S. Nordholt and S. Sheshank. A new     approach to practical active-secure two-party computation.     Manuscript, 2011. -   [19] Kristin Lauter, Michael Naehrig, and Vinod Vaikuntanathan. Can     homomorphic encryption be practical? In CCSW, pages 113-124. ACM,     2011. -   [20] Richard Lindner and Chris Peikert. Better key sizes (and     attacks) for lwe-based encryption. In CT-RSA, volume 6558 of Lecture     Notes in Computer Science, pages 319-339. Springer, 2011. -   [21] Adriana Lòpez-Alt, Eran Tromer, and Vinod Vaikuntanathan.     On-the-fly multiparty computation on the cloud via multikey fully     homomorphic encryption. In STOC. ACM, 2012. -   [22] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. On ideal     lattices and learning with errors over rings. In EUROCRYPT, volume     6110 of Lecture Notes in Computer Science, pages 1-23, 2010. -   [23] Daniele Micciancio and Oded Regev. Lattice-based cryptography,     pages 147-192. Springer, 2009. -   [24] Benny Pinkas, Thomas Schneider, Nigel P. Smart, and Steven C.     Williams. Secure two-party computation is practical. In Proc.     ASIACRYPT 2009, volume 5912 of LNCS, pages 250-267, 2009. -   [25] Matthieu Rivain and Emmanuel Prouff. Provably secure     higher-order masking of AES. In CHES, volume 6225 of Lecture Notes     in Computer Science, pages 413-427. Springer, 2010. -   [26] Nigel P. Smart and Frederik Vercauteren. Fully homomorphic     encryption with relatively small key and ciphertext sizes. In Public     Key Cryptography—PKC'10, volume 6056 of Lecture Notes in Computer     Science, pages 420-443. Springer, 2010. -   [27] Nigel P. Smart and Frederik Vercauteren. Fully homomorphic SIMD     operations. Manuscript at eprint.iacr.org/2011/133, 2011. 

What is claimed is:
 1. A method, comprising: performing a homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer, where each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p_(i) of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X), and where performing the homomorphic evaluation of the function further comprises performing one or more operations using one or more matrices from one or more of the ciphertexts.
 2. The method of claim 1, where the one or more operations comprise homomorphic multiplication operations of two ciphertexts performed by entry-by-entry multiplication of matrices from the two ciphertexts.
 3. The method of claim 1, where the one or more operations comprise automorphism of a ciphertext performed by permuting columns of the matrices from the ciphertext.
 4. The method of claim 1, where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(i).
 5. The method of claim 4, where for each small prime p_(i), p_(i)−1 is divisible by m, where m is an integer defining the m-th cyclotomic number field.
 6. The method of claim 4, where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m'th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); performing one of adding or subtracting p from every odd coefficient of ā(X), thereby obtaining a polynomial δ(X) with coefficients in (−p_(t), p_(t)]; computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p_(i) and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i),F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).
 7. The method of claim 4, where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m-th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); adding or subtracting multiplies of p_(t) to every coefficient of ā(X), thereby obtaining a polynomial δ(X) where all the coefficients of δ(X) are divisible by an integer r, where r is co-prime with p_(t); computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p₁ and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i),F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).
 8. A computer system, comprising: one or more memories comprising computer-readable program code; and one or more processors, wherein the one or more processors are configured, responsive to execution of the computer-readable program code, to cause the computer system to perform: performing a homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer, where each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotomic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p₁ of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X), and where performing the homomorphic evaluation of the function further comprises performing one or more operations using one or more matrices from one or more of the ciphertexts.
 9. The computer system of claim 8, where the one or more operations comprise homomorphic multiplication operations of two ciphertexts performed by entry-by-entry multiplication of matrices from the two ciphertexts.
 10. The computer system of claim 8, where the one or more operations comprise automorphism of a ciphertext performed by permuting columns of the matrices from the ciphertext.
 11. The computer system of claim 8, where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(i).
 12. The computer system of claim 11, where for each small prime p_(i), p_(i)−1 is divisible by m, where m is an integer defining the m-th cyclotomic number field.
 13. The computer system of claim 11, where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m'th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); performing one of adding or subtracting p, from every odd coefficient of ā(X), thereby obtaining a polynomial δ(X) with coefficients in (−p_(t), p_(t)]; computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p₁ and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i),F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).
 14. The computer system of claim 11, where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m-th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); adding or subtracting multiplies of p_(t) to every coefficient of ā(X), thereby obtaining a polynomial δ(X) where all the coefficients of δ(X) are divisible by an integer r, where r is co-prime with p_(t); computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p₁ and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i),F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p setting a′(X)=ã(X)/p_(t), and outputting a′(X).
 15. A computer program product comprising a computer readable storage medium having program code embodied therewith, the program code readable and executable by a computer to cause the computer to perform: performing a homomorphic evaluation of a function on one or more input ciphertexts, where the one or more input ciphertexts were encrypted using an encryption scheme that includes a plurality of integer moduli, where each ciphertext contains one or more elements of an m-th cyclotomic number field, where m is an integer, where each ciphertext which is defined relative to one of the moduli q, each element a(X) of the m-th cyclotoinic number field is represented via a matrix, with each row i of the matrix corresponding to an integer factor p_(i) of the modulus q and each column j corresponding to a polynomial factor F_(j)(X) of the m-th cyclotomic polynomial Φ_(m)(X) modulo q, and where content of the matrix in row i and column j corresponds to the element a(X) modulo p_(i) and F_(j)(X), and where performing the homomorphic evaluation of the function further comprises performing one or more operations using one or more matrices from one or more of the ciphertexts.
 16. The computer program product of claim 15, where the one or more operations comprise homomorphic multiplication operations of two ciphertexts performed by entry-by-entry multiplication of matrices from the two ciphertexts.
 17. The computer program product of claim 15, where the one or more operations comprise automorphism of a ciphertext performed by permuting columns of the matrices from the ciphertext.
 18. The computer program product of claim 15, where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(i), and where for each small prime p_(i), p_(i)−1 is divisible by m, where m is an integer defining the m-th cyclotomic number field.
 19. The computer program product of claim 15, where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(t), where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m'th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); performing one of adding or subtracting p_(t) from every odd coefficient of ā(X), thereby obtaining a polynomial δ(X) with coefficients in (−p_(t), p_(t)]; computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p₁ and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i), F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X).
 20. The computer program product of claim 15, where the plurality of moduli consist of products of smaller primes p_(i), where the t-th modulus q_(t) is the product of the first t smaller primes, q_(t)=Π_(i=1) ^(t)p_(i), where the one or more operations comprise performing a modulus switching operation from q_(t) to q_(t-1) on a ciphertext, and where performing the modulus switching operation comprises scaling down each element a(X) of the m-th cyclotomic number field in the ciphertext by a factor of p_(t)=q_(t)/q_(t-1), where the operation of scaling comprises: setting ā(X) to be a coefficient representation of a(X) mod p_(t); adding or subtracting multiplies of p_(t) to every coefficient of ā(X), thereby obtaining a polynomial δ(X) where all the coefficients of δ(X) are divisible by an integer r, where r is co-prime with p_(t); computing the representation the polynomial δ(X) by a matrix of elements δ_(ij)(X), where the element in row i and column j of the matrix is computed as δ(X) modulo the i'th small prime p_(i) and the j'th polynomial factor F_(j)(X) of the cyclotomic polynomial Φ_(m)(X) modulo p_(i), δ_(ij)(X)=δ(X) mod (p_(i),F_(j)(X)); subtracting δ(X) from a(X), setting ã(X)=a(X)−δ(X); and dividing ã(X) by p_(t), setting a′(X)=ã(X)/p_(t), and outputting a′(X). 